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ON THE THEORY OF SYSTEMATIC SAMPLING, II 


By Wittiam G. Mapow! 
Institute of Statistics, University of North Carolina 

1. Summary and introduction. In an earlier paper,? [1] an approach to the 
problem of systematic sampling was formulated, and the associated variance 
obtained. Several forms of the population were assumed. The efliciency of the 
systematic design as compared with the random and stratified random design 
was evaluated for these forms. It was remarked that as the size of sample in- 
creased the variance of a systematic design might also increase, contrary to the 
behavior of variances in the random sampling design. This possibility was verified 
in [2]. 

One approach to the study of systematic designs, given by Cochran [3] removed 
this difficulty to some extent by changing the problem to one of the expected 
variance, and supposing the elements of the population to be random variables. 
He showed that if the correlogram of these random variables is concave upwards, 
then the expected variance of the systematic design would be less, and often 
considerably less, than the variance of a stratified design. 

In the present paper the results of the earlier papers are extended to the sys- 
tematic sampling of clusters of equal and unequal sizes. Some comments on 
systematic sampling in two dimensions are included. 

In section 2 we derive two theorems that have considerabie applications in 
many parts of sampling. Although it has been common for people working in 
sampling theory to tell each other that these theorems ought to be true, yet no 
reference seems to exist. 

In section 3 we develop the implications of a remark |1, p. 13] that in designing 
sample surveys we should try to induce negative correlation between strata. In 
Theorem 3 we obtain sufficient conditions for the correlation to be negative. 
The lemma and Theorem 4 given in Section + enable us to extend the uses of 
Theorem 3 in practice. As an application of these results, we show that if a 
population has a concave upwards correlogram, and if strata are defined in an 
optimum fashion for the selection of one element at random from each stratum, 
then we can define a systematic type design that will be more efficient than 
independent random selection from each stratum. 


In sections 5 and 6 we obtain various results in the systematic sampling of 
clusters largely as applications of the more general theorems of the earlier sec- 
tions. In general the results are of a nature similar to those of [1] and [8] in that 
the formulae show the conditions under which systematic sampling may be 
expected to be more efficient than random or stratified random sampling. We 
have not, however, applied these formulae to specified types of populations. 


1 Submitted for publication, November, 1948. Parts of this paper were prepared while 
the author was Visiting Professor of Statistics at the University of Sao Paulo, Brazil. 
? References to the articles and book cited are given by Roman numerals. 
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From [1, 2 and 3] it is already apparent that this work will be useful and such 
studies should be more valuable when made in connection with important types 
of surveys or data than when made as illustrations in a general paper. 


2. Random events and conditional expectations. Almost invariably, samples 
are selected in several stages. For example, to select a sample of households from 
a city one frequently used method is the following two stage sampling plan: 

a. A map of the city showing the location of each block is obtained and 

brought up-to-date. ’ 

b. Using this map, a sample of the blocks of the city is selected (this is stagel1). 

c. From the households on the blocks selected in stage 1, a subsample of house- 

holds is selected (this is stage 2.). 

In this section, we give a general approach for evaluating the means and 
variances associated with multi-stage sampling. This approach has the ad- 
vantage of at once yielding the contributions to the variance arising from 
each stage. Furthermore, the theorems presented are useful in calculating vari- 
ances even when our interest is not in multi-stage sampling. The theorems are 
presented in general terms because of their wide application in sampling. 

We shall say that the result of performing an operation is a random event A* 
if the result can assume m possible states Ai, --- , Am with probabilities p, , --- 
Pm, Where 


’ 


PiA*= Aj =p, Da=1, 
i=l 
and P{A* = A,;} is read “the probability that the random event A* assumes 
the state A; .” 

One illustration of an operation is the operation of selecting a sample of blocks. 
If there are N blocks in the city of which we select n in such a way that each 
set of n of the N blocks is a possible sample, then there are C’; possible samples. 
In this case m = CX and the Cy possible samples are the m states of A* “the 
result of selecting the sample of blocks.’ Furthermore, if each of the possible 
samples of blocks is equally likely to be selected, then 


P\{A* = A,} = Z “a 
a ee ee 
The random event A* may’ also be the taking on by a random variable of 
one of its possible values. If z* is a random variable having possible values 
Z1,°**, 2m With probabilities pi, --- , pm then we can define the states of 
A* to be A; where A; is “z* = 2z;.” 
Thus the notion of a random event includes the two types of randomness 
that are met in selecting samples. 
Let x’ be a random variable. Then, by the conditional expectation of x’ subject 
to the random event A* is meant the random variable E*(x’ | A) whose possible 
values are E(x’| A;), 7 = 1,---, m and whose probabilities are p;, that is 


P{E*(2’ | A) = E@’| Ai)} = pi = P{A* = Ai}, 
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where 
(2.1) E(x’ | Ai) = > ij Pi(Ai), 
aij is the jth of the N; possible values of x’ when A; occurs, and 


A;} 


is “the probability that x’ = 2,; given that A; occurs.” It should be noted 
that if 


pA.) = Pla’ = 23; 





then 

Dis = Pix’ = 2,;, A* = A;} 
since the fact that x’ = 2;; implies the occurrence of A;. Then 
(2.2) pi pi(Ai) = pi. 


We state Theorems 1 and 2 without proof since their proofs are immediate. 
THEOREM 1. The expected value of the random variable E*(x’ | A) is E x’, i.e. 


E{E*(2’ | A)} = Ex’. 
By g27y*;4 We shall mean the random variable whose possible values are 
Or'y|4; > U = 1, °*+ , m where 
Czryrjas = Elle’ — E(a'| Ai) ly’ — Ey’ | Ad) | Ai} 


and 


Ploary a = Orry|ar} = Pi = Pi A* = A;}, 


oryia = E*{|x’ — E*(x’ | A)] [y’ — E*(y’ | A)] | A}. 


Furthermore, the symbol oge¢2-}4)2«(y|4) Will stand for “the covariance of the 
two random variables E*(x’ | A) and E*(y’ | A).” The corresponding definitions 
of variance are obtained by replacing y’ by x’ above. 

THEOREM 2. If x’ and y’ are random variables, then 


* 
Crys = Eoryya + Oee(z a) Eey’|A) 


and 
2 2 2 
ce = Eor\a + ope(2"}4) + 


We note that, since the p;; , p; and p,;(A,) are not specified, Theorems 1 and 
2 are valid for any two-stage plan. The generalizations of Theorems 1 and 2 
to multi-stage plans are obvious, but in practice it often turns out to be simpler 
to apply the theorems several times. 
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It would be easy to give applications of Theorems | and 2 but these are not 
essential for our purposes in this paper. As remarked in the introduction, these 
two theorems have long been part of what we may call the folklore of sampling. 


3. Stratified sampling and negative correlation, with an application to syste- 
matic sampling. In discussing plans for sampling from a stratified population 
it is customary to suppose that if x’ is an estimate and x’ = aj + +++ + 2, 
where x; is the contribution to 2’ arising from the jth of the Z strata, then the 
sampling is to be so done that the random variables x; and xj, 7 ¥ 7, are inde- 
pendent. 

In [1, p. 13] it was noted that if a population were stratified, and if the elements 
were so selected that the contributions from different strata were negatively 
correlated, it would follow that the variance of the estimate would be less than 
if the contributions were independent but had the same covariances within 
strata. This was, of course, an immediate conclusion from the fact that 


1 


9 
Oz" = a Oz;2; 


1,7=1 
and, hence, if 
(3.1) C= 2, esis, <0 
‘Fi 


then oz is less than it would be if C = 0. If C < 0 we shall say that the sample 
design has “negative correlation.” 

It is obvious that any population may be taken to be itself a sample, a sample 
from the possible populations that might have been produced by the forces that 
determined the existing population. Inasmuch as sampling designs are often 
chosen on the basis of a knowledge of the dominating forces and some past 
experience, it is realistic to consider not only the expected values and variances 
for a specific population but also their expected values over all possible popula- 
tions determined by the same forces. Cochran [3] has given one illustration of 
the usefulness of considering the expected variance of a sample design. He 


considered the elements 21, --- , x, of the population themselves to be random 
variables and supposed that Ha; = w and E(x; — uw) = oc. For his purposes 


it was also convenient to suppose that if wu > 0 then E(x; — uw) (isn — w) = 
p.c . It was then possible for him to make realistic hypotheses concerning the 
correlogram, i.e. the p, considered as a function of 7, that would not have been 
reasonable in dealing with a specific population. He thus obtained general 
conclusions concerning the expected efficiency of systematic sampling designs 
as compared with random and stratified random designs. 

In this paper we shall consider not only the expected values and variances 
for the given finite population but also the expected values of these expected 
values and variances under the assumption that the elements of the population 
are themselves random variables. We shall use & to denote the expected value 


cere 
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considering the elements of the population to be random variables and as before 


use / for expected values based on the specified finite population 
Then 


I 
i 


~ 2 > 
wT. = ” 


, 


Cc , 
O07; z; 5 
2 


rr 


1,j=1 


and ii GC < O we shali say that the design has ‘expected negative correlation.’ 
\Ve now propose to obtain the beginnings of an approach to sample design 
when it is possible to introduce or take advantage of negative correlation or 
expected negative correlation through the sample design. 
To simplify, we shall begin by considering two strata and shall suppose that 
the possible values of 2’ are x, +++ ,, While the possible values of y’ are y; , --- 


’ 
y, . Furthermore, we shall suppose the sampling to be so done that 


Dal ce ce ae ae aes _— - a: a - ‘om 
Pix’ =2,;} = Ply’ = yi} = Pia’ =2;,,y = y:} = ps > 0, 


so that >, pi = land Pia’ =2,,y = y;} = Oif i Fj. 


1 
seni 


Under the above assumptions, it follows that 


(3.2 oy => DiXiYi — > Di PjXUiYy; - 
=] i,j=1 

The symbol ¢;; > O means that ¢;; > 0 for all i and j and ¢;; > 0 for at least 
one pair 7, 7. We shall say that if (7; — 2;) (yi — y;) > 0 then the sets (7) and 
(y), where (x) stands for 7, ,--- , a, and (y) fory, +--+ , y, are similarly ordered 
and if (v7; — x;) (yi — y;) < 0 then these sets are oppositely ordered. Then it 
is easy to prove, [-4, p. 43} directly that if the values are oppositely ordered, then 
oy < VU and if they are similarly ordered then cy, > 0. 

A somewhat more general result is the following: 

THroreM 3. Let n < k, let 


n } 
= > } ie Qj; W; 7; 
i=1 


j=1 


be « real bilinear form, and lect 


be a real linear form, where w; > 0,2; > Oand >. w, = - 7,=1. 
‘ i=l 
Then a sufficient condition that b > t is 
(3.3) ai; > Qi. 
If = n and w; = 2; thenb > tif 


(3.4) Ai; + aj: > ai + 4;;. 
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PROOF. Since 
b-t= > a;(w;2z; — w) + Z. Qi; Wi2; ; 
i=l ix] 


and since 


it follows that 
b —t 7. (a;; = Qj) W;2;. 


ii 
Hence, b > ¢ if (3.3) holds. Also, if k = n and w; = 2; then b > 1# if (8.4) holds. 
Some obvious generalizations of Theorem 3 have been omitted since we do not 
need them. 

To obtain the result that c,,, < 0 if the sets (x) and (y) are oppositely ordered, 
we make the identifications a;; = x,y; and 2; = w; = p;. Then (8.4) holds and 
substituting we have 


(3.5) Asi + 55 — Ai3 — Ay, = (2; — 25) (Yi — 5) 
so that if the values are oppositely ordered, oz, < 0, and hence the two strata 


have negative correlation. 
To consider expected negative correlation we note that 


n n 
(3.6) Sory = pion tt YD pipjoi 
i=1 i,j7=1 
where we suppose that Gr; = wu, Sy; = v and 
(a; — vw) (Ys — ») = 4; 
so that in this case ¢;,; is a covariance, not a variance. 
If we put a;; = o;; and z; = w; = p;, then (8.4) holds and we obtain, as 
sufficient for Gc, to be negative, that 
(3.7) Cig toji > Cu + 05; 
or, if we define p;; by the equation, 
Tzr0yPij = Tij, 
where o; = & (x; — wu) and a, = &(y; — v)”, we have 
(3.8) pis + pPji > Dit T Pi 
as a sufficient condition for Go, < 0. 
Let us consider the systematic sampling of single elements. In systematic 


sampling, we assume a population of kn ordered elements x71, %2,°*:, 2; 
Ligh °° * 5 Loe, *** 5 Lis(n-iyky °** » Unt Of which we wish to estimate the arith- 


Lee SS == =n 


atic 
Xk y 
rith- 
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metic mean Zz. As our estimate we use 


B= (eit +++ +2m)/n 


fe ° , / 
where 2; is selected at random from 2, --- , 2; and if 71 = 2; then 2; = 7j4G-1)s, 


i = 2,--- ,m. Thus, # may be interpreted as an estimate based on a stratified 
population, the 7th stratum consisting of 


Vi+(s—-1)k 9 ° °° 9 Vk+(—-1)k 


and 
/ , / 
P2i = Lasyk} = Pix = La+(i-wk y 07 = La¢cj-1yk} = 1/k 
while 
, , . 
PUXi = La¢(i-yk » XT] = Xe4cj-1yk} = O, if a ¥ 8B. 
Then 
k 
2=(Q)z ; bid 
Oxy NE La+(i-1Ik*Vay(j-1k ~~ Lit; 
r a= 
where 


k 
‘ 1 
= i> Zz Ca+(i-1)k- 
k a=1 


Hence, any two strata that are oppositely ordered will yield a negative contribu- 
tion to the variance. However, since it is not possible for all strata to be nega- 
tively ordered, we do not thus obtain a useful result and must return to the 
consideration of C or ¢? itself as was done in [1]. If, however, we make Cochran’s 
assumptions, and consider &¢,, , it follows that for the 7th and jth strata 


Pag = P(j-i)k+8—a » 
and (3.8) becomes 


(3.9) P(j-ayk+(G—a) + PG-ink+(a—s) 2 Zpcj-ve , 


i.e. the correlation function p, must be concave upwards, which Cochran showed 
by other means. By considering &C it is possible to show that a sort of average 
concavity is all that is required of the correlogram for systematic sampling to 
have a smaller variance than stratified random sampling. 


4. Conditions for negative correlation when the strata are of unequal sizes 
with an application to systematic sampling. Often, as in the systematic selection 
of clusters with probability proportionate to size (discussed in Section 5) the 
simplified situation dealt with in Theorem 3 does not directly apply. However, 
Theorem 3 may be used to advantage by the following device. 

Let us suppose the possible values of x’ to be 71, --+ , 2, and those of y; to be 
Yi,°°°,ye,k > nand let 


’ 


Ply’ = yp|2’ = 2a} = Dpja 
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so that if we define 


(4.1) = a Ys Pia, 
=1 
then 


Ya = E(yo| x’ = te). 
If we define y’ to be a random variable having possible values y1, + °° 5 Yn 
with probabilities p;,- - + , Mn where 
Da = Plz’ = x} 
it follows that 
y’ = E*(yo| 2’) 
and 
Ox'yy = Tzly’- 


Clearly, Theorem 3 is valid for the random variables x’ and y’. 

Consequently, we need only determine what restrictions the conditional 
probabilities, ps;., and the values, y,, need satisfy for the setsa1, +++, 2, and 
Yi, °** 4 Yn to be oppositely ordered or for (3.7) to hold. 

Substituting for y; and y; in (3.5) we see that if 


(4.2) (ta — y) Do y3(Pajea — Psiy) < 0 
p=1 


then Cxz'y’ = Tx'yo i 0. 
Let 
_ = S(va — b) (yy — v). 


Then substituting in (3.7) we see that if 


(4.3) = (psja — Paty (onxs — 43) < 0 
or if 
(4.4) : Peta — Paty)(pas — prs) < 0 
then 


c , 
OCz’yo <@ 


In order to use (4.2) and (4.3) the following well-known Jemma is often useful. 
LemMMA. If &) < & < +++ < & < Oand the quantities «| , +--+ , €, are such that 


rn 


Fig ee 


s=1 


eee 


In 


ial 


nd 


SYSTEMATIC SAMPLING 341 


then 


deste <0, s=1,-,k. 


8=] 


Let us use this lemma to obtain another theorem that will be helpful in showing 
negative or expected negative correlation between strata. 
THEOREM 4. Let b be a bilinear form 


n m 
b= 7 > QAij Wi 2; 


i=1 j=1 


such that 2. w; > 0, 2, 83 >0,s=1,-:-,n—1,8 = 1,+++,m — 1, and 


i=) 7=1 
n m 
(4.5) Z w,= Zz z= 0. 
i=] 7=1 


. 


Oij = Giz — Gitng — Giju + Gini. 


Then a sufficient condition that b < 0 is 6;; < 0. 
Proor. Upon substituting for w, and z» in b from (4.5) we see that 


n—1 
b=> 
i=1 


m—1 
aie 
655 Wi 2; 


M 


HW 


7=1 
where 


. 


O77 = Aij —m Aim — Anj + Anm 


or, if we define, 


then 


m—) 


b= Do &2;. 


s=i 


According to the lemma, it then follows that a sufficient condition that b < 0 is 
that 


Str 


S&S Sher 5 0. 
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Then to complete the proof it is only necessary to verify that 
6:5 = 855 — 8:54 — Stang + Otay. 

In the preceding pages we have given an identification of systematic with 
stratified sampling where, instead of the selection being made independently 
within strata, the choice of an element from one stratum determines the choice 
from the other strata. In this identification, however, it was assumed that the 
strata contained the same number of elements. Let us now extend this method 
of selecting samples to the case where the strata have different numbers of 
elements. In so doing we shall illustrate the use of the above lemma and theorem 
4. 

Suppose now that the population consists of N elements x71, --- , xy classified 
into n strata, the ith of which contains the N; elements 


UN y+ +00+Ng_ 141 ges a DN ye 4NG . 


We shall denote these elements by va, --+ , Lin; . 

We shall select one element from each of these n strata. The element selected 
from the ith stratum is written z;. As the estimate of Z, the arithmetic mean of 
the population, we use 


ga > Neg 


ain” 


and it is well known that if the selection is made independently at random from 


each stratum, then 
2 “(Ni 2 
oz7 = a (*") Oi 


where a; is the variance of 21, i.e. the variance of the ith stratum. 

Let us now consider an alternative to the usual method. We can suppose that 
N, > 1 without any loss of generality. (The methods are the same for any stratum 
having NV; = 1 and will also yield the same result for any population such that 
either all the N; = 1 or all but one of the N; = 1. Differences occur if at least two 
of the N; differ from 1.) 

We first choose an element at random from the first stratum. Suppose that 
a} = 2%. Then to choose an element from the second stratum, assuming that 
Ne > 1, we proceed as follows: Multiply N2 by any positive integer f2 such that 
Noto/N, is an integer, say, /t2. Assign to each element of the second stratum the 
measure of size t2, and form the two sets of cumulative totals tf: , 2t., --- , Not 
and ke, 2ke, --- , Nike. Then with the measures of size 2 assigned to each element 
of stratum 2, and the measure of size kt. assigned to each element of stratum 1, 
it follows that strata 1 and 2 have the same total size. 

As an example of the arithmetic given below consider the following simple case. 
Suppose that VN; = 3 and N2 = 4. Then if we take for tf. the value 6, it follows 
that k2 = 8. We choose one of the integers 1, 2, 3 with equal probability. If the 
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integer 1 is obtained, we have selected the first element of the first stratum and 
choose an integer between 1 and 8 with equal probability. If the selected integer 
is between 1 and 6, the first element of the second stratum is selected. If it is 
7 or 8 the second element of the second stratum is selected. Similarly if the 
second element of the first stratum is selected, then we select an integer between 
9 and 16 with equal probability. If that integer has value 9, --- , 12 the second 
element of the second stratum is selected; if it has value 13, --- , 16 the third 
element is selected. 

The general formulation of the selection procedure for the second stratum is: 

Suppose that 8) is the smallest integer such that (a — 1L)ke + 1 < Gotz and that 
8, is such that (3; — 1)te < ake < Bite. Choosen an integer at random from 
1, --- , ke and call that integer 8. Then, if 


(a — 1)ke < (a — L)k2 + 8 < Bote 
the Soth element is selected from stratum 2; if 
Bote < (a — 1)ke +B < (Bo + Lite 
the (89 + 1)th element is selected; --- ; and if 
(8: — lhe < (@ — De +B < ale 
the 8,:th element is selected from stratum 2. 
It is easy to verify that when the sample is so selected, each element of stratum 


2 has equal probability of being selected. Hence, if we apply this procedure to 
each stratum we have 


° “{N; NN; 
Oz. = 7 (*) oi OC; it os No Orx,2;- 
i=1 N ij N 
Let us evaluate o.;,; for this type of selection. Now 
Griz; = E (x; — &) (x; — ¥;) 
where Z; is the arithmetic mean of the elements of the 7th stratum. From Theorem 
2, we then have 


x Elx! —E(xi | na)l[xj — E(x) | il 


*i = a=1 


oa [E(x; | tia) — F)E(x;| te) — %4). 

It is easy to see that the method of selection used above implies that the first 

term of o,;-; vanishes. Furthermore, 2; is the arithmetic mean of the conditional 

expectations so that we have reduced the problem to one of determining whether 

the conditional expectations satisfy the conditions for negative correlation or 
expected negative correlation. 

If we denote E (x; | 21a) by yia , then we need to see whether the sets ya --- 


’ 
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yin, and yj, °** , Yjnx, ave Oppositely ordered. Now 
Ni N 
(Yia — Yis) Yja = ia => bw = Vig Uh €igas Ejhas 
g=1 A=1 
where 
/ / } 
€igas = Pixs = xig| te} — P{xg = Lig | X15}. 


If a < 8 then, according to the method of selection, 


Zz €igas = 0, s= i. ooo N; —] 


a=] 


while 
Ni 
Zz. €igas = 0. 
g=1 


In Theorem 4, we then make the identifications n = N;,m = N;, 


Wo = Eigas » Zh = €jhap ANd Agn = Lig Tjr- 


Then 


Sah = (Lig — Xisgsi) (Tin — Xj,n41) 


and hence to have negative correlation between the strata, it is sufficient that 
the sets ra, °--, xin, and x;;,-+--, Xjn; have the type of negative ordering 
represented by 6,, < 0. Similarly, if 


Cn = S(xig = Mi) (X jn = bj), = Sx ig ) 
then, for expected negative correlation, it is sufficient that 


Toh — Og,h+1 — Fg+1,h + Oo+1 ht < 0. 


Of course, these conditions will be satisfied if a concave upwards correlogram 
exists. Hence, if a population consists of N random variables 2, --- , 2» having 
a concave upwards correlogram, then, no matter into what strata these elements 
are classified, provided that the order of occurrence of the elements remains un- 
altered, the systematic selection of the elements in the sample can be so planned 
as to yield an estimate having smaller variance than the stratified random selec- 
tion of the elements in the sample even if optimum allocation is used. If more 
than one element is being selected from a stratum under optimum allocation, 
then the systematic selection of the same number of elements will suffice. If not 
only optimum allocation but also optimum definitions of strata are being used 
so that but one element is selected from each stratum, then systematic selection 
according to the scheme described in the section will produce a variance not 
larger than the variance of stratified random sampling. It should be noted, how- 
ever, that this does not imply that a ‘hammer and tongs’ use of systematic 
sampling ignoring the strata will produce a smaller variance. There is work to 
be done on what is required for the latter to occur. 
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It may be noted that the procedure of this example provides an answer to the 
systematic selection of elements from a population whose size is not a multiple 
of the size of sample. 


5. The systematic sampling of clusters with probability proportionate to a 
measure of size. It is known [5] that sampling clusters with probability pro- 
portionate to a measure of size often yields considerable reductions in the variance 
of the estimates. However, the theory of the systematic selection of several 
clusters with probability proportionate to a measure of size has not been worked 
out, and it is the purpose of this section to make some contributions to that 
theory. 

The most frequently used method of sampling clusters with probability pro- 
portionate to size is equivalent to the following: Suppose that the clusters are 
denoted by Ci, --: , Cy and that to the Ath of these M clusters is assigned a 
measure of size P;, . Form the successive totals Pi , Pi + P2,Pi+ P2+P3,-:-, 
P; + --- + Py. If we wish to select m of these clusters, we calculate P,, = 
(P; +--+ + Pw)/m. Then, assuming that P; < P,,j = 1, --- , M, we select 
an integer with equal probability from 1, --- , P, . Calling that integer P’, we 
calculate the m numbers P’, P’ + P,,, P’ + 2P,,---, P’ + (m — 1)P,. 
If 


(5.1) Poteet PhatisP+@—-1P.27P+---+ Pi 


for any integer 7,7 = 1, --- , m, then the cluster C; is selected for the sample. 
Any cluster for which P;, > P,, is automatically included in the sample, and if 
there are, say, a such clusters, then we calculate P,,_. for the M — a clusters 
remaining after including these a in the sample, and proceed as above. 

In deriving the variance of the estimate we shall use, we interpret that estimate 
as a stratified sampling estimate. Although it is easy to obtain the expected 
value of the estimate without that interpretation, we shall need it later in the 
derivation of the variance, and hence we give it here to shorten the total presenta- 
tion a little. 


Suppose that clusters C),---,C;, are such that 
P, + apes + P,-1 < P's S P, + is + P,., : 
Then we define stratum 1 to consist of clusters C,,--- , Cx, . It is easy to see 


that if the above sampling method is used then 


" , Pr, 
P\C;, is selected from stratum 1,h < ky} = Pp? 
m 
. . PP. — PP, -—--- —P, 1 
P\C,,, is selected from stratum 1} =~’ = —, 
Pw 
Furthermore, suppose that clusters C;., , +++ , Ci,4%, are such that 


Aton 4+ Pun £9, 2£H4+ 4H 


1 itke + 
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Then we define stratum 2 to consist of clusters C;.,,--- , Cr,+%,. It is easy 
see that if the above sampling method is used, then 


2 eee — 
P{C;, is selected from stratum 2} = Pit --- + Py = fe 


Pa : 
> 
P{C;,+: is selected from stratum 2,1 < h < ky} = a 
2Pm — Py — +++ — Pri+ke—t 


P\Ci,+%, is selected from stratum 2} = 


Pn 


Since P;, < P,, we remark that it is impossible that C;., be selected from both 
stratum 1 and stratum 2. 


In general, if clusters Ci,+...44;-1, °°° » Ch,+-.-44; are such that 
(5.2) Py ties) + Paygeegesa <t Pm < Pit +++ + Pryy..se, 


then the 7th stratum consists of these /:; + 1 clusters, and we define the probabil- 
ities Pig, a = 0,--- ,k;, by the equations 


Po = P{Ci,+...4%;_, 18 selected from stratum 7} 


_ Pytices + Prypeettsy —@ — Pp 








B,, , 
Pig = P{C;, is selected from stratum 7,k; + --- this <<h <hy +--+ +h;} 
(5.3) ap. il fe i ri 
Px, = P\Cr,4...4:; is selected from stratum 7} 
_ Py = Pro = Ppt 
Pm 
We remark that 
(5.4) Pia, Ba Pye thint 
-u;_, + Pow P. 
Now, let the elements of the population be 2,;,h = 1,---,M,7 = 1,--:, 


N,,, and let the arithmetic mean of the Ath cluster be denoted by Z, . Since the 
N; are usually unknown but the measure of size, P; , is known, we sample, not 
with probability proportionate to the N;, but with probability proportionate 
to the P;,. We shall denote the clusters of the ith stratum by Cu, --- , Cix;, 
making the identification 


(5.5) i i dscttta a 


Furthermore, the number of elements of the clusters are denoted by Na, -+:, 


th 
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Nx; , and the means of the clusters by Zio, --+ , Zi; , where 
Nia = N a+ky+---+h5—3 
(5.6) Lia = La+ky+---+kg3 
so that Zi = Fi1,4,;_, and No = Niayw;_,,t=1,--° ,m. 
Furthermore, we define 
(5.7) Sin * Nig Beal Pin @ Base.e---0te.0° 
We define the mean of the 7th stratum to be 


(5.8) Zi 7 z PuttalPas 


a=0 


and the variance of the ith stratum to be 


. P ta = \2 
(5.9) a. — pm = (Zia ae Sq- 
a=0 m 


Then, if the mean and variance of the population are defined to be 


M 
(5.10) t= >> Prds/P 
h=} 
and 
M 
9 . P ae ~\? 
(5.11) co =) — (& — 2), 
h=1 P 
it is easy to verify that 
(5.12) po22 9, 
Mm i=l 
and 
~ 2 1 7 9 1 . (= ait 
(5.13) o > ee ee ee 
7 j=l ML {xl 


An unbiased estimate of the total of a characteristic. We shall see that we can 
obtain an estimate of x, where 


MeN 
= ~ z. Lij 

i=] j=l 
i. e. x is the total of the elements of the population. Since N is unknown, the 
estimate of % that is used is the ratio of unbiased estimates of x and N. It is well 
known that this ratio is usually biased. Since we are not making any study of 
ratio estimates here we will not derive the approximation to the variance of this 
estimate. It may be remarked that it can be obtained by a simple extension of 
the results here given. 
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Let us agree that the general form of the estimate will be as follows: 

If the jth cluster of the population is selected we shall subsample n; elements 
from it. The total of the values of the characteristic for these n; elements we de- 
note by x;. Furthermore, we denote by n: the total number of elements sub- 
sampled from the 7th stratum, or, what is the same, from the cluster selected 
from the ith stratum; and by x; the total of these elements. Thus, if the jth 
cluster is the ith selected, then n; = n; and x; = «;. We define our estimate 2” 
of x, the total of the population, to be 


(5.14) a” = K(xy +--+ + 2m). 


Then, if K = P/mn and n, = nN;/P;, it is easv to see that 2” is an un- 
biased estimate of x. 
The variance of the estimate. We may calculate the variance of «’’ where 


~ = abl 5 alll ant ”, 

(5.15) vl” = Py (%1 +:°+> + im) and 2%; = 2; /n. 
Now, by Theorem 2, 

~1¢ 2 2% 2 

(5.16) O27 = Eons Or*(x/*! 4) 5 


: - > 2% . — 
where A* has been defined above. We shall not evaluate Eo:,.;4 since this in- 
volves no new problem for subsampling methods using random or systematic 
methods, or methods using probability proportionate to size. 

From (5.15) it follows that 
sai . = a8 — 
(5.17) E*(x" IA) = Pa(%1 + -+* + Fm) 
or, in other words, E*(x’’, A ) is the estimate we would have if the clusters in the 
sample were completely enumerated. We shall denote the second term of (5.16) 

24 

by OR. Then, 


(nm ‘ 
on 2 =a | 3, its 
(9.18) on = Pas » os; + >. Cxcs.? 

| jan} £3 é 
Now 
- a, P. ee . 
(5.19) oz; = fom ha Vi) = Cj. 

a=0 f ™ 


To calculate 02:2; , 7 # 7, we shall use Theorem 1. 


ce 3 
has . rill Se Vsat — eee SS \ mare! a, 
(5.20) ozjz; = E(Z; — 2); — Fj) = EL; — Z)E* ME; — %5) | %:}. 
rr nf = -— ° e 
To ealeulate E*[(Z; — Z,;) | Z%;] we begin by noting that 

ws J} | > e > 

™ wer pe! = ail ait = c 
(5.21) E*((Z; — 25) | Z| = E*((Z; — 75) | C; 


ys i 4 . 

where Cy is the random event having /; + 1 possible states which are the selec- 
tions of Cio, --- , Cx; as the sample clusters of the 7th stratum. Now if Ci. is 
one of the clusters of the 7th stratum let us calculate 


(5.22) El(z; — Z;) | Cial 


J taie 


uw 


he 
6) 


lec- 


a IS 
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We begin by determining which of the clusters of the jth stratum are possible 
sample clusters, if we know that C;. is selected from the ith stratum. Since the 
sizes of strata 7 and 7 are both P,, it follows that there exist integers 8) and 8; 
such that 


Pp tere + Pisga S$ Pat ess + Pie < Pat s+ + Pi, 
and 
ra? *** + Pama < Fat +++ + Po S&S Pa + + + Pja,. 


Hence, if we know that C;. has been selected from stratum 7, it follows that we 
must select one of the clusters 


C iso ? C jgo4i ne C53, 
from stratum j and 


P{C3g is selected | C';. is selected} 


Pi3/Pia, 8 = Br, Bo +1,°**,B: 
= 0, otherwise, 
where 
P45 =Partere t+ P 52, — Pa —-°-, Pian 


Ps = Pis3,8 =B+1,-°--,B—1 


/ 
P 58, = Pa + a + = P x eye ae P js,-1 ’ 
and 
81 
, 
pe P is — P sa 
B=8 9 
Then 
~ wal = ee =S - 
(5.23) E\(%; — 4; | Cia) tia — 2X; 
where 
Pi P" 
- O° ~ 1B 
(3.27) Lila = } a = js. 


3=30 Pt 


Hence, substituting in (5.20), we see that 


° an? = anit = 
03,2; = E(Z; — 7:)(Zs — 75) 
, 


ji = Fa if Cia is selected from stratum 7. Then it follows that 


where 2 


kg P. 
/= ~ cles 1a — = — = 
(5.25) ons) = Do (Bia — Fe )(Fjja — F}). 
a=0 m 


Obviously, the conditional expectation can be eliminated from (5.25) by using 
(5.23) but no gain in simplicity or generality thus occurs. 
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It would be possible to obtain the variances and covariances of the x; by 
listing all possible samples in any special case. To make this general would only 
require writing the necessary notation. 

Substituting in (5.18) we see that 


( m 


PF, {2 a +X ox} 


= 


where o3;3; is given by (5.25). 


m 


It follows that if we use the fact that >. (z,; — %) = 0, then we have 


i=] 


k ki 
Op _ P 2 eS 5 * (Fie = x) + P*, X eS ~ (Bia aati E)(Zj0 ea Z), 
i=l a=0 ™m 14) a= 
or, returning in part to the “unstratified” notation 
ars 2 P? SP, ,- ial ia / 
(596) «3 = — >, —* (2, — 27 + SY -* (Zia — E)(Fjja — 2). 
m h=1 P m ixj a=0 
By combining terms of the second part of (4.26) generalizations of the formulae 
obtained in [1] are easily obtained. 


Still another means of writing o% is 


1 
- om sot ad et 
(5.27) fly — gb. + Le _* (Z; X)(Xj\a-Z) ? 
where 
m 
2 1B gs 
a. = \o:— rm) ’ 
Mm i=l 


which shows both sources of changes in efficiency as compared with sampling 
with probability proportionate to size, and replacing the clusters obtained. (It 
is, of course, obvious that P’s"/m is the variance of E*(x”’ | A), if we assume the 
m clusters to have been selected with probability proportionate to size, each 
selected cluster being replaced before the next is selected.) 

By considering (5.26) and (5.27) it is clear that systematic sampling with 
probability proportionate to size will be more efficient than sampling p.p.s. 
with replacement under much the same conditions as when we sample single 
elements. The details are omitted. They depend on applying the Lemma and 
Theorem +. The summary of the conditions is: If we sample systematically with 
p.p.s., and if the two sets 21, --- , ®%,; and y;;, «++ Yj; are monotone, one being 
monotone non-icreasing and the other monotone non-decreasing, then the covari- 
ance between the ith and jth strata will be negative, and thus gains made as 
compared with independent sampling from the strata. 

If we define 


0 en ~ 
Cas = (Zia — GF ia) (Z 58 — &% 58) 


th 
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then the concavity condition for systematic sampling p.p.s. to yield a smaller 
variance than independent sampling p.p.s. from each stratum is, if a < 8, 


0 0 0 0 0 ‘ 
Jal — Oy < Fa2 — Fy2 < we < Fak; ~ Cyk; < 0. 


6. The systematic sampling of clusters of equal sizes. Let us now suppose 
that our population consists of clusters of elements, the clusters being of equal 
size, i.e. containing the same number of elements. To be specific, let the popula- 
tion consist of M clusters, where M = cm and each cluster contains N elements, 
where N = kn. Then, the value of the characteristic being measured for the 
ath element of the zth cluster may be denoted by zia, and the total of all the 
elements of the 7th cluster may be denoted by z;. The arithmetic mean of the 
population is Z, and thus 


Mi = y 3» &. 
i=] 
where 
N i. => Zi . 


a. Complete enumeration of clusters in sample. First, suppose that we wish to 
estimate Z by <’, where Z’ is the arithmetic mean of the sample obtained by 
selecting a systematic sample of m of the M clusters, and enumerating all elements 
within each cluster in the sample. Then, we may write 


(6.1) mz! = Do, 


i=] 


where Z; is the mean of the ith cluster selected for the sample. From [1], it follows 
then that 


2 ob , : 
> {1 + (m — 1p} 


M 
where Mo} = pa (%;. — #)’, and #, is defined as f, in [1, p. 6], but with Z, in 


i=1 
place of x;. Now from the theory of the random sampling of clusters it follows 
that 


2 
o 


Vv {1 + (NV — 1)p} 


o% = 
2. ‘ . ‘ 
where o is the variance of the population, i. e. 


M N 
MNo = >. >d (2; — 2 
i=1 j=1 


and p is the intraclass correlation coefficient of elements within clusters, i. e. 


2 2 2 
op = 0, —0,/N —1, 
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where 
Vu No 
MNow = 2, 2, (ej — 2)”. 
i=1 j=1 
Thus 
6.2) oi = {1 + (N — Vp}{l + (m — Vie}. 
di 


Of the three factors in (6.2), o /mN is the variance of a random sample of size 
mN selected with replacement; 1 + (N — 1)p is the factor arising from the use 
of clusters; and 1 + (m — 1) -, is the factor arising from the fact that the clusters 
are sampled systematically. 

b. Stratification and subsampling. When we consider the possibilities of stratifi- 
cation and subsampling, the number of possible designs increases tremendously. 
For example, it would be simple to calculate the variances of arithmetic means 
obtained by stratifying the population, selecting sampling units with probability 
proportionate to size, subsampling systematically, again subsampling systemat- 
ically and finally subsampling at random. However, such studies may be left 
to be made in connection with the practical problems in which they are to be 
used. Rather than attempt to consider many of the possibilities that might 
arise in practice, we shall here give only the results of the systematic subsampling 
of a systematic sample. The variances of many other designs may easily be ob- 
tained by means of Theorems | and 2. 

Suppose now that from each of a systematically selected sample of m clusters 
we subsample, systematically, n elements. Then, let our estimate of Z be Z’ 
where, if 2. is the ath selected element from the ith sample cluster, then 


m 


ie (2) ie ee 


mn i=] a=] Mm i=l 


" | Sy 
t= (-) Dd aie. 
n a=1 


From Theorem 2, it follows at once that 


and 


(6.3) o: = ae, £0 + (N — L)pt{1 + (m — 1A} + pa te + (m — lpi}, 
mN M =i mn 

where a; is the variance within the ith cluster and p; is the average serial cor- 
relation within the zth cluster as defined in [1, p. 6]. It is simple to calculate the 
variance of z’ also when the sub-sampling is done by considering the m clusters 
in the sample as one population from which a systematic sample is selected. This 
is the case that occurs when a sample of blocks is selected and all the households 
on the sample blocks are listed serially, a systematic sample then being selected 
from the lists. However, for our present purposes it is the analysis of (6.2) that 
is important and we now turn to a brief discussion of (6.2). 

The most important conclusion to be drawn from (6.2) is that the systematic 


1¢e 


SYSTEMATIC SAMPLING 353 


selection of clusters even when systematic selection is desirable, may not com- 
pensate for the increase in variance caused by the use of clusters. Systematic 
selection will provide the same relative gains but these gains may not be large 
enough to produce the inequality 


MN — mN 
MN —-1— 


A problem that we have not worked through is the following: By regarding 
the elements of the population as random variables, we obtain conditions on the 
average correlations among elements of a single cluster as well as on the average 
correlations among elements of different clusters that enable us to state where the 
systematic sampling of clusters of equal sizes may be expected to yield a smaller 
variance than the random or stratified random sampling of clusters or of indi- 
vidual elements. This solution should be straight forward. 

c. Systematic sampling in two dimensions. Systematic sampling in two dimen- 
sions occurs in such practical problems as the selection of a sample of blocks from 
a city or the selection of a sample of plots from a field. 

In selecting blocks from a city, the procedure most often followed effectively 
reduces the problem to one dimensional form by first numbering the blocks of 
the city or a part of it, in serpentine fashion beginning, say, in the upper right 
corner of a map of the city and numbering the blocks in the top row from right 
to left continuing the numbering of the second row from left to right and so on. 
Then a systematic sample of these block numbers, and hence, of the blocks 
themselves is selected. Clearly, this procedure should not be the most efficient 
if neighboring blocks are highly correlated, since, to cite an unrealistic possi- 
bility, the possible samples might turn out to be columns of blocks of the city. 

A second two dimensional systematic sampling procedure might be that of 
selecting a systematic sample of the rows and a systematic sample of the columns, 
thus obtaining a grid sample. This design too is inefficient when there is a ‘‘fer- 
tility gradient” along rows or along columns. 

The reason for the inefficiency of both of these procedures can be found by 
examining the formulae for the variances of systematic samples. If the numbering 
is serpentine, then it becomes illogical to expect that the correlogram is concave 
upwards and sharp deviations from that pattern may occur. In the grid design, 
which is a special case of the systematic sampling of clusters with systematic 
subsampling, we may examine (6.3) and note that the intra-class correlation 
coefficient p may be large enough for o% to be large even when A; is negative. 

Clearly, (6.3) suggests that the possible samples be so defined that p is as 
small as possible. In square fields this might be attained by defining the possible 
samples to be plots of a Knut Vik square having the same treatment, and sim- 
ilar definitions of possible samples could easily be given for irregular fields. 
This subject is, however, left for further study.® 


{1+ (WV — Ipjtl + (m — 1p} < 


3 One of the referees of this paper has drawn the author’s attention to an article {6}, 
the data of which, especially Table 3, are in accordance with the opinions expressed above. 
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PROBLEMS IN PLANE SAMPLING 


By M. H. QuENOUILLE 
Rothamsted Experimental Station, Harpenden, England 


1. Summary. After consideration of the relative accuracies of systematic and 
stratified random sampling in one dimension the problem of estimation of linear 
sampling error is discussed. 

Methods of sampling an area are proposed, and expressions for the accuracies 
of these methods are derived. These expressions are compared for large samples, 
with special reference to correlation functions which appear to be theoretically 
and practically justified, and systematic sampling is found to be more accurate 
than stratified random sampling in many cases. Methods of estimating sampling 
errors are again considered, and examples given. The paper concludes with 
some remarks on the problem of trend in the population sampled. 


2. Accuracy of systematic and stratified random samples in one dimension. 
W. G. Cochran [1] has given expressions to the variances of the means of samples 


of size n drawn from a population x;2%2 --+ Xn, when the method of sampling is 
random (r), stratified random (st) and systematic (sy). He assumes the elements 
2102 *** Xnx to be drawn from a population in which 


E@:i)=n, E@i-vy =o, E@i-—u) tiv — 4) = a 


where pu > py > O whenever u < v, and derives the expressions 


‘ e 1 9 kn-1 
= n (: o 4 E ~ kn(kn — 1) » ~S ‘dps| 


1 2 ¥ i 
1-))t- ey 2, ( - woe 


kn-1 


9 a) n—1 

. 1 - ink = 1) a (kn — u)pu + var D 2 (n — pu. 
Using these expressions which are linear functions of the p, Cochran compares 
the relative efficiencies of the methods of sampling for several types of correlo- 
gram. It is worth noting that (1), (2) and (3) can be derived under more general 
conditions than Cochran considered. If we assume that (a) each x; is a sample 
from a population with mean y; and variance oj , (b) that y; is distributed about 
mean yw with variance o , (c) that E(u; — wu) (uj — vw) = pio’, and (d) that 


1 kn—u 


——- Pi.izu, then it is not difficult to show that (1), (2) and (3) 
cane i=] 





Pu 
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' al ese i} is: ; 
require the addition of a superposed variation —{ 1 — a >. o; to the right- 
n . cn i=l 


hand side of the equations. Thus it should be remembered that Cochran’s 
results give theoretical maxima to the relative efficiencies of the various methods 
of sampling, while p, is the mean correlation between samples wu apart. This 
result is perhaps interesting in connection with sampling for say, insect infesta- 
tion, when at each point there will be a mean level of infestation and the sample 
will be distributed in a Poisson distribution about this mean. Then the superposed 


Variation is 
I 1 | 1 
ee ee ee 
n ( ') ke 1” n ( 1) 


ce . ° 1 ° 
If we are sampling a continuous process , for n large we can write down the 
integral equivalents of (1), (2) and (3) 


| : 2 2 ‘| 
n d* Jo 


) 


= 


(5) a a . E — 7 | p. Ou + 2 - pan | 
a 0 


n u=1 


where p, is the mean correlation between successive elements of the sample, wu 
apart and d is the mean distance between samples. We have thus 


c “ ‘) il " 20 ) 

ae 8] ~ { e e 

— eo ~~ = pudt + [ Pudt — d Ze Puu hy 
Ge; d /0 d Jd 


om] 


which can often be used to investigate, quickly and roughly, with the aid of a 
graph the difference between the efficiencies of stratified random and systematic 
sampling. Figure 1 shows how this is done for four types of correlogram. 

For a continuous Markoff scheme, we have p, = p” and 


‘ o 2 2 2p | 
ge ~—j)1+—t+ —, — <= I> 
-» log p’ = (log p*)*— (log 4)? | 


ss es” ) 2p? 
oy ~—|1+ — , + ——], 
n log p' 1 — p? 


which agree with Cochran’s results. 





3. Replication and the estimation of error. Yates [2} has pointed out the 
difficulties attached to the estimation of error for a systematic sample. It will, 
however, be worthwhile to investigate this point using the above formulae. 

' In practice we can sample a continuous process only as if it were a discontinuous process 
with & large. 
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For random, stratified random and systematic sampling, if n is large and k is 
regarded as constant, then the variance of the estimate of the mean will be of 
the form o F(k)/n, where F(k) is virtually independent of n. Thus, if we have 
any method which provides an estimate of error for the samples it will be possible 
to split the series to be sampled into several equal parts (or blocks) to obtain an 
estimate of error of the mean of each part and to combine these to obtain a more 
accurate estimate of the error of the overall mean. In fact, if » is very large, we 
may Wish to reduce our number of observations by obtaining estimates of error 
from a random selection of these parts. For stratified random sampling, F(/:) is 
completely independent of n, so that we may combine our estimates of error from 
each strata. This leads us to the commonly used method of taking g randomly 
chosen elements per strata, and combining the sets of variances of g — 1 degrees 
of freedom to form an estimate of error. If we make our samples exclusive, 
i.e. no two elements can coincide, then this variance has to be multiplied by 
1 — q/k to give the estimated variance of the sample mean. 

We can in the same way estimate the variance of the mean of a systematic 
sample by using sets of g systematic samples of sufficient length with randomly- 
chosen starting points. This sampling will, however, be more difficult to carry 
out in practice, and we might consider other methods. Our systematic samples 
may be chosen to be invariable in each part or block into which the series is 
split so that our sampling procedure involves, in all, only q systematic samples, 
or we might follow the method advocated by Yates of choosing our q samples 
to be evenly spaced, so that they are subsamples of a larger systematic sample. 
Whereas this latter method has simplicity and its possible incorporation into a 
more extensive scheme to recommend it, its use has to be very carefully con- 
sidered. If we consider the discrete case, we wish to estimate 


9 2< 2k < 
(6) 0 (1 --— 2 nt 2 pss), 
k-1 u=1 k-1 u=1 
but any estimate of variance based on q evenly-spaced systematic samples can 
contain only terms of the form pzu;¢, and while an estimate of variance based 
on g randomly-chosen systematic samples will obviously be limited, it will, in 
most cases, be more representative. As an example, suppose we take / = 16 
and q = 4 then we can compare the relative occurrences of observing the correla- 
tions p; --- pis In the estimate of variance. Six examples of this are given in 
table 1, the random numbers having been drawn from Fisher and Yates tables; 
pu and pie_, being shown together, since they occur equally frequently. The 
table demonstrates how randomly-chosen samples, even as nearly systematic 
as the first two randomly-chosen samples will avoid systematically sampling the 
correlogram. It is obvious that in most cases either method will be fairly good 
but the use of this latter will usually be the more accurate. Comparisons are 
made in table 2 for various types of correlogram using the samples indicated 
in table 1. It is, of course, possible to postulate theoretically many kinds of 


? Throughout this paper 6 is used for the differential sign to prevent confusion with d. 
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correlogram for which the equal-spaced sets of systematic samples will break 
down, but ultimately we must decide with reference to the types of correlogram 


TABLE 1 


Frequency of occurrence of the serial correlations pi , pz ... pis in the estimate of 
variance when 4 systematic samples each with spacing . 16 units | are taken 














: 7 
i | 4 systematic s amples with random starting 
| 4 evenly- points at 
| spaced Total fre- 
, | systematic [|~ l —) _ | quencies 
| samples | 4,7, | 3,7, 3, 6, 4, 6, 2, 8, 2. 6, 
| | 8, 12, | 8, 12, | 10, 13, | 7, 14, | 11, 15, | 11, 16 
sais || _ amine es = Dales l= ccna dlc = sniesasinde 
1,15 | ; 1d} ot } 1 | | 8 
2,14 | | | } 1 | 1 2 
3, 13 1 zi) 1 2 6 
4, 12 4 2}; 2] 1] 1 1 7 
5, 11 1 2 | 2 5 
10 | 1 1 1 1 4 
7,9 | 1 | 2 1 2 1 7 
8 t 2 + 








TABLE 2 


15 
Values of #; = py as estimated by systematic samples 
u=1 





Evenly- 








pr Systematic samples with random starting points li Ex- 
systematic ——_ F — pected 

| samples | 1 2 3 4 5 6 
1-0.2 u, (u = 1, .5) 0.17 | 0.27 | 0.20 | 0.17; 0.30 | 0.17; 0.13; 0.21 | 0.27 
1-0.1 uw, (u = 1, .10) 0.53 | 0.62 | 0.58 | 0.53) 0.60 | 0.53 0.53 0.57 0.60 
a 0.04 | 0.13 | 0.12 | 0.06, 0.15; 0.06, 0.07; 0.10 | 0.13 
2-04 | 0.58 | 0.66 | 0.64 0.60 0.66 | 0.60 0.60, 0.63 0.65 
Kendall’s Series 1 —0.14 | 0.03 | 0.00 |—0.05 0.16 '—0.05 —0.05 0.01 | 0.07 





* Naturally the use of this method of estimating the sampling error assumes that the 
correlation between the corresponding elements in each part or block into which the series 
is split may be neglected, i.e. in this case that the terms pis and above are negligible. In 


this case pis = 1/16 and consequently the term 2(;, = pu—if 2 pis.) = 0.56, required in 
u=1 u=1 


15 
(6) differs slightly from the term #; = Pu = 0.65 which we are attempting to estimate. 


experienced. We shall consider this point further, after we have dealt with 
two-dimensional sampling. 


4. Methods of sampling in 2 dimensions. The number of ways in which we 
can sample a two-dimensional space’ is large, since we can employ random, 


3 We shall, in general, consider our two-dimensional space to be rectangular, but it is 
not difficult to draw similar conclusions for an area of any shape. 
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stratified random or systematic sampling in either direction. Thus we will be 
able to consider every possible combination of these methods, e.g. random in 
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Fic. 1. Graphical comparison of the efficiencies of systematic and stratified random 
sampling for various correlation functions. The thick line gives the function 


fi(u) = up./d, os esd 
= Pes d<u, 

and the dotted line the function 
fo(u) = pia, (i —1)d < u < id. 


Thus systematic sampling is more or less efficient than stratified random sampling according 
to whether the area under the thick line is greater or less than the area under the dotted 
line. The most efficient method is indicated on each graph. 


one direction and systematic in another will be denoted by r-sy. Furthermore 
we can consider the sets of samples in one direction to be aligned with one 
another, or to be independently determined. The suffix 1 will be used to denote 
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Kxamples of several methods of sampling 
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Fic. 2. Methods of sampling a field. In this case,m = m = k = k = 3. 
o the 


aligned samples while suffix 0 will denote independent samples, e.g. we might 
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= 5. Accuracy of sampling in two dimensions. Suppose we consider s saruple 


of nn: elements drawn from the elements x; ;(¢ = 1,2, +--+ nit: ,7 = 1,2, --+ mk), 
(which form a single finite population drawn from an infinite hypothetical 
population), such that the mean spacing in the two directions is /; and / 
These parameters will, if necessary, be indicated in brackets after the method of 
sampling, e.g. m18yo(rals 5 mele). 
Let. X denote the mean of a sample formed by the method considered, and 
x’ a member of this sample. Suppose, also, that the z;; are drawn from a popula- 
tion in which 


Helen creer en seein ailenseaechannsoe! 


9 9 


E(vi;) = p, E(vi; — pw) =o, 


Co , Litujtv — a ie ce 
E U; bt) (x; 4, j+u KL) = Pijuvd 
Further we may average p;j.. over all possible values of 7 and 7 to define p,,,. = 
C ° 
p-u.-» bY the relation 
= Zz Piju = (ky m — |U |) (hee Ne — | v pur. 
x aS 


The purpose of these definitions is to allow to eliminate the difficulties associated 
with the parameters of finite populations by considering this population as 
being itself a sample from an infinite population. Cochran employs a similar 
device. 

5a. Random sampling. It is not difficult to see that 


o(X) = } E(X: — X,)* = E(X, — w)* — E(X: — w) (X: — »), 
where XY, and X»2 are independent samples. 
Also 
E(X; — p)(X2 — uw) = E(x; ter u) (29 — p) 


SS (him — jul) (ene — |v Dove 


hey hs 714 Tle 


o ; ] 
= on 
hy hs Ny Me 
where the double summation’ exists over the region S given by |u| < kin, 
iv | < iene and excludes vu = v = 0. We thus have to evaluate F(X, — yu) for 
the different types of random sampling. 
It is easily shown that 


l 


E(N, _ \ = 


; E + ny ne — I 5 - = (kyr, — | u 1) (Ire ne — |» | 


he; ho My Mol hy hes Ni Noe — 


for fofs, 


o l 


~~ oo ; | 
wa’ 2 | 7 (/: yi = 1 \\ (ko No — | DY ) Pur 
mn | ky | nik, hee mn. — 1) Zz 2. ‘ Ne Noe |v | 


2(no — 1) kono 
- poeraneennion ke ne — VU)Por 
hey No(kon2 — 1) dy (kets — v)p | 


v=] 





ing ~ ; a 
4In general, unless otherwise stated, double summations will exist over the region for 
which the coefficients are positive, excluding u = v = 0. 
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for Tif, 
o (my — 1)(m — ae 
— k — | yi — |v uv 
Ny Ne E + ky ken no(ki I} meet — 1) x x ham | u |) ane |? lp 
2nm—-1) BW, %m,-—1) % 
+ kana(kom — 1) a (ka Ne v) pov + * Ee - x (ky nm U) puo 
for "11, 
whence 





: 1 ] 2 
~~ ' Pa 1 ps - 
. (ro r) Ny Ne ( ky ) - 


1 —- — . pa 1 Zz. > (ky mn | Uu |) (ke No — |v Dow 


ky key ny No( ky ko N41 Neo 
2 1 1 9 ky ko No— 1 
Pi = 1-—- — F cee mere oe 
re ny ( ky ) ° [1 - (ky ke — 1)ky ke ny no(hi ke ma ne — 1) 
(8) DD (kim — | u|)(keme — |v [pur 


2(n2 i 1) kone 
Ia Kg M2 — v Ov 
hr No( ke Ww 1) dX ( 9 No 2 )p | 


o(n7;) = l i cae os) e E —~ _ rkelm + me — 1-1 m 
_— Ny Ne ky ke (ky ke — L)hy hoo ny no( hy ke mie — 1) 


2hi(m. — 1) 
(9) > 2 (hy ny a | u |) (Ke no — |v pu» + ; ko ie 1)no(ke ihe 1) 


(7) 

















¥ (ko 2 — v) ___ Shalom — 1) i (ky ny — wu) 
v=1 si Pov (ky ke —_ L)ni(ki ny —_ ‘d) u=1 er ss 


5b. Stratified random sampling. We can deduce the variances for some methods 
of taking stratified samples if %; , the mean of the elements sampled in the 7th 
stratum, is independent of Z; , since we will then have 


E(X ~ #) = E(%; — &,)°/n, 


where Z is the mean of the finite population which is sampled. Hence 
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1 1 2 1 
= Le oo “ 
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‘ l 1 J 
2 (st = 1 — —— }o?| 1 — ——__—___. 
o (st ro) nN Ne ( ky ) . | ky ke no(ki kz — 1) 


(11) > (a — | w |) (kone — | 0 | pur 


2h, (nz sas 1) un 
i Ir = uv ’ 
(ky ko — 1)ne(ke ~— 1) a ( ™ v)p 


, l 1 2 s 
oe - _)¢|1- —_,—_.. 
o (sto sto) N1 Ne (1 hy :) - | ky ko(ky ke — 1) 


>> i (ki — | w|)(kg — |v Dou | 


To estimate the variance of other methods of sampling, we will make use of a 
general formula which we might have used to derive the expressions (8)-(12). 
If x; is any element of the sample X, then 


(X —7)°= a [>> (xi — #)* — DS (ai -— X)*] 
cies (x! _ 7 - ny N2 — —— bl 
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o(X) = E(X — 2) 
hy k 2Ny Ne ‘2 9 1 
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Ky ken ne 


, o—1 2 = et , 
(13) * (hen, — |v Dow sen erst os + eornmie = ; E(x; re a) (x; a us) 


Ny Ne Ny Ne 


1 l ° 1 
= —{1 — ——-Je]1 —- ——_—_—_—— i 
ny Ne ( key =) ' | iy he m no(ky ke — 1) Dh Lb (kim — | w|) 


5 (he ng — | Vv |) Pur + ee “- ae Ea —— Hw) (a Lam 1a 0) 
Mike — 1 o* 
Thus, provided that we can estimate E(x; — p) (x; — »)/o° the expression (13) 
gives the error for all methods of sampling. 

As an example, we might deduce the expression (12). If we choose any member 
zt; , then a second member 2; will be located at random with respect to x; except 
that there will be i; — 1 positions in the same stratum as 2; that a; will not be 
able to occupy. Thus the expected correlation E (x; — pw) (a; — w)/o’ will be 
given by 


(14) aH) EL ham = [wlan = 1 Dow 


hry kg ny no(ny Ne 
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— tam od hh i — |e = |e Dew. 
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If we substitute (14) into (13), we will obtain expression (12) for the variance of 
stjsty . In the same manner, We can derive for stsf; the expression 
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L . ‘ s ' 
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Thus we can evaluate o (X) for all types of stratified random sampling. 

5e. Systematic sampling. In a similar manner to that used for stratified random 
sampling, we can use (13) to evaluate the variances of systematic sampling. 
- / ! , . . 7 . ° 
Values of E(x; — w) (x; — uw) for three of the possible methods of sampling are 
given below. For syisy: 


/ , , | 
(16) E(a; _ bp) (2; — pL) a ad 
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of »)]- ke 
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~~ Du (kz — v)po» + on DD (hi — |u|) (me — |v [puter 
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hy ‘ua 


The derivation of (18) may be compared with that of (15). 


| 6. Effect of alignment. We can examine the effect of alignment either by 
an examination of the values of the variance of different samples, or by the 

ul) | direct use of (13). For random and stratified random sampling, the effect of 
| alignment is to increase the variance of the sample by an amount 


22 Auv(pov as Puv) + 22 buv(puc — Puv) where Quy => 0, 
bus = 0. 


Mv This will be positive for monotonic decreasing correlation functions, and for the 
majority of functions realised in practice. Thus alignment will usually increase 

| | the variance for random and stratified random samples. 

ty For systematic samples, the position is more complicated, but, roughly, the 
variance is increased by an amount 


om | zz Gue(Pt,u,deo ca Bu,u.deo)s 

ng. | z 

are | where a,» > 0 and jx,u,x.» iS @ mean over a rectangle, centre px,u,i.0 for u and v 
| 2 


non-zero, and is a mean over a line, length /; centre po, +.» for u zero, (and similarly 
for v zero). Whether this is positive or negative will depend on the correlation 
_—) function, and it will have to be investigated for the types of correlation function 
; which are encountered. 


7. Limiting forms. For a continuous process, when m; and ne are large, we 
may, in the same manner as for linear sampling, obtain integral approximations 
to the sampling variance, provided that 22 pa,u,a,.. converges. 

We thus have 
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> r-) 9) © 
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‘d oo 0» OV 
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(26) o (sysy) ~ : > Payu,dgv — >> | i Puy dudv |, 
Ny Ne Uu=—0O y=—0 dds — 3 I— 


2 o” ] © pay | 
o (syo syo) rw Ny No l! = d? ds z - (d, =— ie |) pu» dudv 
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di v=—o dy — dy 


8. Particular case where p.., = pup». We note that, if pu» = pupr® most of 
these forms can be simplified greatly. If we write 


2 
sy, = 1 — =. pudu + 2 a 
dy Jo 


u=! 


¢ 2 ’ 
st, = 1 —- 2. (d; — u)pudu, 
dy 0 


with similar forms for sy, and st, , and, also 


9 2 9) do 00 

. sand a af “= \ 1 ‘ 
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* A sufficient condition for this to be a valid autocorrelation function is that both pu 
and p, should be autocorrelation functions. 


eee. 
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then we have, for example, 


(28) o (rim) ~ ae (1 + fi), 
1 Ne 
2 
2 oC 
ww —— (] 
(29) o (ri11) aie (1 + fi + fe), 
(30) a (sto Sto) ~ £.. (stu sty + stu+ sty), 
NM 
2 
(31) o'(sti sti) ~ —— (sty Sty + fi stu + fo str), 
1N2 
2 
(32) o' (sy: sy:) ~ —— (syusyr + fisyu + fosyr), 
Ny Ne 
2 
(33) a (syo syo) id ome (stu sty + fi SYu + fr SY»). 
NN 


From these we get 


2 
2 2 
« t t 
(34) o (st; st) o (sy: sy) ~ om 


- [(stu sty — sYyusyo) + filstu — syu) + fo(st. — syr)], 
2 
2 — 2 ees o 
(35) o (syi 8yi) — o (Sto Sto) a 


: [a om syu) (1 — 8Yr) = (1 = sty) (1 = Sty) } + fi sy, + fosyrl, 


(36) o°(sto sto) — o (syosyo) ~ —— [filstu — syu) + fo(st. — syr)]. 
1/%2 








The forms (34), (35) and (36) enable us to compare the variances of the samples 
in two dimensions by using the one-dimensional results. For most practical 
cases, we know that the f’s are positive, st, > sy, and sty) > sy», so that 


(37) o°(stst;) > o° (sysy:) > 0° (stosto) > o°(syosyo). 


The values of o°(stosto)/o°(roro), 0° (sy:sy1)/o°(roro), a (syosyo)/o°(roro) and 
o (stosto)/o°(syosyo) for Pau = p\ and Pig» = ps” are given in table 3. It is not 
difficult to show that for a given number of samples, (d; , do fixed), o°(stosto), 
o (sy:sy:) and o°(syosyo) are least when p; = p2. The expressions tabulated have a 
value of 1 for p: = p2 = 0 and tend to limiting values of 0, 2/3, 0, and 2 respec- 
tively as p; and p: tend to 1. It is interesting to note that for p; and p» differing 
by more than 0.4 the grid imposed by sy;sy; is less efficient than purely random 


sampling. The type of function p.» = pup»® is, however, less likely to be realised 
6 For a town survey, we might find the correlation between two points depending on a 
within-streets and a between-streets correlation, so that this function could be realised. 








TABLE 3 
Comparison of the efficiencies of “orm and random sampling for various values of p; and pz 
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1.000 1.000 1.000} 1.000 1.000] 1.000; 1.000} 1.000 1.000) 1.000) 1.000 
1.000 1.000, 1.000; 1.000, 1.000) 1.000; Lom t 1.000 "nr 1.000) 1.000 


| | ee | 
| 0.720 0.669) 0.632 0.601 0.575 5511 0 0.529; 0.508) 0.489) 0.471 
0.1 0.739 0.754) 0.827 0.956! 1.160] 1.. 


0 

1.488) 2.055| 3.215 6.734, = 
0.596 0.534) 0.493 0.462] 0.437) 0.416] 0.398 0.382) 0.367, 0.354 
1.21 | 1.25 | 1.28 | 1.80 | 1.31 | 1.32 | 1.33 | 1.33 | 1.33 | 1.33 











.609| 0.565, 0.529) 0.497) 0.469) 0.443) 0.419) 0.396, 0.375 

0.2 0.706) 0.721) 0.788} 0.914) 1.134] 1.532) 2.362, 4.911} © 
0.462) 0.416) 0.380) 0.352) 0.328) 0.307, 0.289) 0.272) 0.257 
1.32 | 1.36 | 1.39 | 1.41 | 1.43 | 1.44 | 1.45 | 1.46 | 1.46 








~ 
am 





0.516, 0.476] 0.441. 0.409 0.380 0.354 0.328) 0.305 

0.3 0.689, 0.707) 0.778 0.924) 1.209 1.825) 3.751) « 
0.365, 0.327] 0.297, 0.271) 0.249 0.229 0.212) 0.196 
1.41 | 1.45 | 1.49 | 1.51 | 1.53 | 1.54 | 1.55 | 1.55 


— 














0.432] 0.394) 0.360, 0.329) 0.300 0.272) 0.247 
0.4 0.680) 0.702) 0.787, 0.983) 1.437, 2.900; % 

0.288) 0.256; 0.229, 0.206 0.185 0.167, 0.151 

1.50 | 1.54 | 1.57 | 1.60 | 1.62 | 1.63 | 1.64 


— 








| 0. 354 0.317, 0.284 0.253 0.223 0.196 
0.5 | 0.675) 0.703 0.821 1.139, 2.228 & 

lo. 223] 0.195 0.171, 0.150 0.132, 0.115 
| 1.59 | 1.63 | 1.66 | 1.68 | 1.70 | 1.71 


| 














 0.279| 0.243 0.210 0.180 0.151 
0.6 | 0.671) 0.712 0.908 1.679, = 
| 0. 167 0.142' 0.121) 0.102) 0.085 
| 1.67 | 1.71 | 1.74 | 1.76 | 1.78 


0.206, 0.172 0.139) 0.109 
0.7 ; | 0.669) 0.742) 1.226) « 

| | | 0.118) 0.096) 0.077) 0.059 

| 1.75 | 1.79 | 1.82 | 1.84 




















0.136, 0.102) 0.070 

0.8 | | | O: 667| 0.863) «© 
| 0.074! 0.055, 0.037 

1.84 | 1.87 | 1.89 


| | | 0.067| 0.034 
0.9 | 0. 667| rr) 

| 0. 035) 0.018 

| | | | | | | 1.92 | | 1 1.95 
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in practice than a centrally-symmetric function, which is independent of the 
choice of axes. For this reason, we consider next this latter type of function. 


9. Centrally-symmetric correlation functions. Dedebant and Wehrte [3] 


have given a necessary and sufficient condition for p(u, v) to be a correlation 
function as 


(38) pu, v) = [ ; [ : cos (wu — pr)dF (a, u), 
or alternatively, 
(39) flew) = ere [ [008 (ou — madotu 0) du do. 


For a centrally-symmetric correlation function we can put u = r cos 6, v = rsin 6 
then p(w, v) = p(r) and 


fl, ») = as | [ cos (r+/w? + 2 cos 6:)p(r)r dé, dr, 
(27) 0 0 
where 6; = 6+ tan ‘(u/w), 
1 


= 5- | Jo(rr)p(r)r dr, where +t = Vw? + p2. 
0 


TT 


Thus, if p(u, v) is centrally-systematic, then so is f(w, u) and conversely, so that 
we get 


(40) fo) = 2 [| Julrrdo(e)ror, 
and 
(41) th mt [ ” Jolre) f(x) 87. 


We can thus find suitable forms for p(r) and f(r). In this connection the formula 


@o 5” 
| Jo(yz)e“by = 1/(a’ + 2)”, a > 0, is useful, since we can see that sane /Y) 
0 


6” 
and sgn + 2)" are possible functions for 2zf(r) and p(r) although our choice 


must be limited by the stochastic nature of p(r) as well as by its convergence. 
Thus, for example, a = n = 0 gives 1/2x7 and 1/r as spectral and correlation 
functions, but these will not converge. 

In the linear case, the Markoff process p(u) = e ™ had a spectral function 
f(r) = 1/r(a’ + 7°) which is a Cauchy distribution in one dimension. If we take a 
two-dimensional Cauchy distribution’ as our spectral function we get f(r) = 


7In the same way as the ordinary Cauchy distribution can be considered as a density 
distribution on a line produced by a point source at a distance a, radiating in all directions, 
so can a two-dimensional distribution be considered as a density distribution on a plane 
from a source at distance a. 
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6 
a/2xr(a* + 7*)*? and p(r) = — ba (e *"/r) = e °". Thus it appears that a generalised 
Cauchy distribution will be the spectral function for a generalised Markoff 
process. 
We can, of course, consider an ’’elliptical” Markoff process given by® 


2 2711/2 
u 2muv , v 
2 = 5 a ed 
(42) p(u, v) = exp E -* 4 
but, in what follows, to simplify the computation, m will be taken as zero, so 


that by changing the units in which d, and d; are measured, we will work with a 
process p(r) = ¢€ *" 


TABLE 4 
Comparison of observed serial correlations with theoretical values obtained from a 
centrally-symmetric correlation function 
Rows Columns North-east South-east 
Distance 


in miles Ob- | Calcu- | Ob- | Calcu- | Ob- | Calcu- | Ob- | Calcu- 
served served lated | served lated | served lated 


0.310) 0.368 
~~ — 0.264) 0.243 
0.090) 0.135 _ oe 


- -- oe 0.050} 0.059 | 0.129 | 0.059 
0.050 |—0.029) 0.050 — — one a 


— a — |—0.050) 0.018 | 0.070 | 0.018 
0.018 |—0.041) 0.018 aan - 


— — — |—0.020) 0.004 | 0.060 | 0.004 


1 
2 
2 
2/2 
3 
2 


tee oo 
Bas 


This process does not seem to be far removed from the type of correlation 
function experienced in agricultural field work.® Osborne [4] has mentioned 
the possible use of p, = ¢”. Mahalanobis [5] has calculated correlations for a 
paddy field of 800 cells; his values are shown in table 4, together with values of 
the function e ’. Bearing in mind that the standard error of each of Mahalanobis’ 
values is approximately 0.035, the fit is seen to be quite good, although an 
elliptical process with axes running south-east and north-east would undoubtedly 
fit the observations better. 


8 In this light, p(r) = e-*" will be called the circular Markoff process, while pu» = a 
u 
and pus = exp — e oh : will be known as degenerate Markoff processes of the first and 
second orders. 
* This is further supported by the fact that using a function of this kind it is possible to 


obtain numerically a law in substantial agreement with Fairfield-Smith’s law over a wide 
range of values. 
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10. The relative efficiencies of systematic and stratified random sampling. 
Ideally the correlation functions developed in the last section should be used 
in the expression (19)—(27), but these functions are not capable of easy integra- 
tion. An alternative approach can be made if we note that 


2 — » 
o (slosh) — o (syosyo) | 1 [ ¢ ~ le) F(u, d2)ou 
dy; dy d; 


o7(7 To) 
io |v | 
+5 ‘. (1 “ Le) F(v, ds)b0 


‘ 2 0 x 
F(u, dz) = 2 lf . Pus dv + [ Puvdv — de z pose | 
d2t/o de ds = 


af * 7 = 
F(v, di) = = Lf = Puvdu + / Puvdu — dy i par . 
dy 0 ad; ad, u=! 


It is seen that F(u, d2) and F(v, d:) are extensions of the expressions obtained for 
(c%1 — ovy)/o; in section 2. Hence, if F(u, dz) and F(v, d;) are both positive 
functions, systematic sampling is more accurate than stratified random sampling. 
A particular case of this occurs when pu» = pip: However when pu» = exp 
{—(u? + v*)"”}, F(u, de) is not always positive, since, as u increases, p.» becomes a 
convex function of v. This complicates the interpretation of (43) greatly since it 
appears that as wu varies from 0 to d; , F(u, dz) varies from + © to an unknown 
value X. This value will be positive if d, >> d,; and negative if d, >> d, so 
that if the sampling is disproportionate in the two directions systematic sampling 
will be more efficient than stratified random sampling. Furthermore, if d; = d,. =d 
and d — 0, F(u, d) — © and systematic sampling again appears to be more 
efficient. Thus in a wide variety of cases this type of systematic sampling i.e. 
SyoSYo Gives a more accurate result than random sampling. 


(48) 


where 


11. Estimation of sampling errors. An examination of formulas (7)-(18) 
shows that the principles used for the estimation of linear errors can be used in 
plane sampling. If we consider that each sample can be broken up into inde- 
pendent units each of which is situated in one of s strata, then for q replications 
we will have gr — s degrees of freedom for error. For example, roro , Tor: , Stoo 
and str; will have gninz — 1, gnz2 — 1, grime — m and gnz — 1 degrees of freedom 
respectively, so that a single sample will contain an unbiased estimate of error, 
but stosto , stost, , stist: , syosyo and syisy; will have nyn2(q — 1), m(q — 1),¢g — 1 
and q — 1 degrees of freedom and will require replication to form a valid estimate 
of error. We can however use the method of splitting our sample into several 
parts each of which will give a fairly accurate estimate of error. We may, again, 
consider the” possibility of using a set of systematic samples, which are evenly 
spaced, to estimate the sampling error, and we will see that the exclusion of the 
p’s of lower order may lead to appreciable bias unless the correlation between 
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successive terms of the sample is small, but, as Yates has pointed out, this 
method will provide an upper limit for our sampling error. These methods of 
sampling are illustrated by the examples given below. 


12. Examples. We shall consider the three methods of estimating the sampling 
errors of a systematic sample: 

(1) using sets of systematic samples randomly placed with respect to each 
other, i.e. the material to be sampled is broken up into a series of sub-areas 
or blocks and several systematic samples are taken in each block; the 
error variance is calculated from the variances of the systematic samples 
in each block, 

(2) using one set of systematic samples randomly placed, i.e. several sys- 
tematic samples are taken and the area is then broken up into sub-areas 
or blocks; the error variance is calculated from the variances of the 
portions of the systematic samples in each block, 

(3) using one systematic sample i.e. one systematic sample is taken which is 
broken into several systematic samples of wider spacing, e.g. four samples 
at four times the original spacing, the area is then divided into several 
sub-areas and the error variance is calculated from the variances of the 
portions of the sub-systematic samples in each block. 

These three methods are increasingly accurate in their estimation of the 
mean, increasingly biased in their estimation of the sampling variance, and 
decreasingly difficult in their practical application, so that our method of sam- 
pling may vary according to the population and according to the use to which the 
results are to be put. It is, for example, conceivable that subsequent sampling 
will yield an improved estimate of error so that initially only a rough guide 
may be required. 

a. If we are sampling from a continuous linear population with a large number 
of observations in each part into which we split our series, methods (1) and (2) 
will both give accurate estimates of the variance per term 


o ( a | pou +2. pu) 
d 0 us] 


Method (3) will, however, estimate o” instead of the correct variance per term, 
which is 


2 oO oe 
o ¢ _ 4 f pudu +2 3” puve). 
d 0 u=) 


Thus the estimates of sampling variance by method (3) will in general be higher 
than the estimates by methods (1) and (2), although the actual variance will be 
lower. 

b. Kendall [6, 7] has constructed 480 terms of an artificial series tiny. = 
1.1 tng: — 0.5 Un + €nge Where the e, are rectangularly distributed from —49 
to 49. For this series c* = 2379.81 and s’ = 2535.11. The series was split in six 
parts of 80 terms, for each of which n = 5, k = 16, q = 4, so that 18 degrees of 
freedom were available for error. The results for this sampling configuration are 
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given in table 5. The values in this table corroborate the conclusions for large 
samples of continuous populations. 

ec. A number of uniformity trials were taken and sampled according to the 
systems st,st; and sy;sy,. For sampling according to the system st,st; the error 


TABLE 5 


Comparison of three methods of estimating the sampling error of systematic samples 
for an autoregressive scheme 


| Estimate of sampling 
































variance per term, s?, | True sampling 
Method | based on 18 degrees of | E (s*) variance per term 
freedom 
= pices ees — ce Ra et = 
(1) 3228 2170 2170 
(2) 1872 | 2170 2167 
(3) 3709 | 2577 | 423 
TABLE 6 
Comparison of efficiencies of different methods of sampling on three uniformity trials 
aie Ci cisnscmamncnass ta | Kalamkar [8] | Wiebe [9] Wynne Sayes and Karishna, 
| Iyez [10] 
No. In Cochran’s 
{11] Catalogue.......... 72 132 108 
Secs cisle re Hochib ssn te a 4508 Potatoes Wheat | Sugar cane 
MEE BIDS ciaaessccwece's 576 1440 960 
Manne ave 23.262 587.95 270.89 
Variance per term........ 15.555 10,018.0* 1794.42 
Type of sampling. .| st; sti |syi sy: | sy; Syi | Sti Sti | Syi SYi | SYi SY1 | Sti Sti | Sy SYi | SY SY1 
Proportion sam- | 
ene 1/6 1/6 1/6 1/9 1/9 1/9 1/8 1/8 1/8 
Method of estimat- 
img Gmrror........ (2) (3) (2) (3) (2) (3) 
No. of partitions...| 1 4 4 1 4 4 1 5 5 
ra 3 3 6 4 2 4 4 2 4 
hae cameib ba caer 2 2 1 3 6 3 2 4 2 
Me escun aise on 16 2 4 20 5 10 15 3 6 
Rarely ueee 6 12 6 6 6 3 8 8 4 
iia ancien il 2 4 1 2 4 1 2 4 1 
eee 23.140) 23.435) 23.323) 586.54) 598.65) 275.29) 275.29) 266.72) 271.27 
Estimated variance 
per term......... 9.763| 2.689) 4.8895151.6 |5772.7 |7038.5 |1320.15| 799.29)1269.54 
Degrees of freedom | | 
of estimated var- | | 
Mosc ces | 48 | 12 12 80 | 12 12 60 15 15 














“* Based on the original 1500 plots. 


was estimated by taking two samples per strata, while, for sampling according 
to the system syisy; , the error was estimated by comparing sets of four samples 
in each part of the series by methods (2) and (3). The results of this sampling are 
shown in table 6. While the number of trials is small, the trend to be seen in the 
results agrees very well with the conclusions reached above. 
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13. Trend in the population. Frequently in taking samples from a population, 
we are faced with the problem of a trend. This will not greatly affect random and 
stratified random samples as estimates of the population mean, but the efficiency 
of systematic samples will be affected to a large extent. If we consider linear 
sampling, and denote by S; the sample whose first element is x; then the set of 
samples S; will usually be monotonic with 7 and the difference between S, and 
S; will be large (roughly equal to x; — zx). 

Yates [1] has suggested a method to overcome this difficulty; by letting S; 
represent 


1 a k-7 
oa E Let Vik toeee H+ Vig~r—oye + — rase-on | 
the difference between systematic samples due to trend is largely removed. 
It is easily seen that this necessitates a small loss of information, and in particular, 
for a continuous random population the variance is (n — $)o’/(n — 1)” instead 
of o/n. For plane samples, the corresponding adjusted sample will be 


il I a i ae iki — i) 
Si = (nm, — 1)(m — 1) FE ke ty + Ie Litky.g + + “— Lit (ny—1)ky of 
¢ ki — i 
= ky Vijtke + Vitky tke +++ > ee Lit-(ny—1)ky ithe 
t(ke — Jj ky — i)(ke — j 
+ ( ke Ie j) Vi,j+(ng—Dke + eee + fae Pisin oe | 


with a similar loss of information. 

Trend is, however, most likely to be appreciable in large samples, and in this 
case, the loss of information due to end adjustments is negligible, so that the 
conclusions reached above will remain unaltered. 

The author wishes to thank Dr. F. Yates and Professor M. 8. Bartlett for 
advice in the preparation of this paper. 
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REPRESENTATION OF PROBABILITY DISTRIBUTIONS BY 
CHARLIER SERIES* 


By R. P. Boas, Jr. 


Brown University 


Summary. The paper describes some results concerning the representation 
of a function by linear combinations of the successive differences of the Poisson 
distribution, not necessarily the partial sums of the type B series of Charlier. 

1. Introduction. For various purposes it is often desired to expand a probability 
distribution f(x) in a series 


(1) fle) ~ > on Oe(2), 


where the @;,(x) are a given set of standard functions. Arguments of a heuristic 
nature led Charlier [4, 5, 6] to suggest that it would be useful to take the 6,(x) 
in (1) to be either the successive derivatives or the successive differences of some 
fixed function; the two cases are often referred to as type A series and type B 
series, respectively. Charlier gave formulas for determining the coefficients in the 
two cases, but the question of whether the formal series represents the given 
function in any reasonable sense has to be investigated separately for each 
particular choice of the function generating the series. Only one special case of 
each type has been much used: for the A-series, (x) is the normal density 
function (21) *e *”’; for the B-series, (x) is the Poisson function ¢*d"/x! (when x 
is restricted to take only nonnegative integral values). We shall refer only to 
these special cases when we speak of A- and B-series in this paper. 

There are two distinct problems (which have, however, often been confused) 
connected with the representation of a function f(x) by a series (1); for con- 
venience, we shall refer to them in this paper as the practical problem and the 
theoretical problem. In the practical problem, we have an empirical function f(z), 
defined only for a finite number of values of x, which we suspect is representable 
by ¢o%(x) together with a small correction, so that we hope that a few (say three 
or four) terms of (1) may give a good representation of f(x) in a relatively simple 
analytical form with a reasonable amount of computational labor. In some cases, 
and certainly with the classical A- and B-series which we are considering, we 
could represent, as closely as desired, any f(x) (however irregular) which takes 
nonzero values at only a finite number of points; but there is no interest in doing 
this if the process involves finding too many terms of the series. (Neglect of this 
fact has led to ill-founded statements by mathematicians about the satisfactory 
nature of the A- or B-series; but see [27, pp. 38-39].) 

Thus it would be of interest to know, if possible, under what circumstances a 
given empirical density can be represented fairly well by a few terms of a series 
of a given kind. If no simple criterion can be given, it is desirable to have a means 

* Address delivered by invitation at the meeting of the Institute at Boulder, Colorado, 


on September 1, 1949. 
376 


ic 
r) 


1e 


he 
on 
ch 
of 
ty 
x 


d) 
n- 


r); 
ole 
ree 
ple 
es, 
we 
Kes 
ing 
his 


sa 
ries 
ans 


ado, 


REPRESENTATIONS OF DISTRIBUTIONS 377 


of computing coefficients which will make a few terms of (1) give the best possible 
fit—best possible being defined in a way appropriate for the problem at hand. 
In the theoretical problem, f(x) is a function defined for all values of z, or at least 
for all of an infinite set of equally spaced values of x, arising from theoretical 
considerations which suggest cof(x) as a reasonable first approximation to f(z). 
For example, the central limit theorem states that under certain conditions the 
cumulative distribution function of the sum of a large number of independent 
random variables is approximately normal; then we might expect that this 
distribution function would be representable by a series (1) with (x) the normal 
distribution function. For such theoretical purposes we should like to have 
criteria for the representability of a sufficiently general f(x) by a series (1), 
where representability is of course to be interpreted appropriately, as ordinary 
convergence, uniform convergence, convergence in mean square, asymptotic 
representation, etc., according to the requirements of the problem at hand. The 
larger the class of f(x) for which we can prove a representation theorem, the 
larger is the possible domain of applicability of the series to theoretical problems. 


2. The A-series. This paper is concerned with the B-series, but for comparison 
we first mention some properties of the A-series. In the case of the classical 
A-series, we have the attractive fact that the functions @,(x) are orthogonal 


with weight function e*”’, that is, 


[ On(X)Om(x)e™ dx = 0, m Nn. 


In fact, e*"6,,(x) is, except for a numerical factor, the nth Hermite polynomial. 
This orthogonality property enables one to compute the coefficients in a series (1) 
with great ease from 


(2) n!C, = | ? f(x)0,(a)e™ dz, 


or since 6,(x)e™ is a polynomial, from the moments of f(x). By the classical 
theory of orthogonal functions, this means that if the c, are so computed, and we 
take N + 1 terms of the series, we minimize 


(3) [ 7 eT f(x) — Fr(x)P dx 


for all possible sums 


N 


(4) Fy(x) = a Cn On(2). 


The convergence theory of Hermite series has been thoroughly investigated by 
mathematicians, so that it would appear that in theoretical problems, in which 
f(x) is given for all values of x, we are in a position to find out everything about 
the representation of f(x) by an A-series. Also in problems of practical curve- 
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fitting, the fact that the closest approximation to f(x) (in the sense (3)) by sums 
of the form (4) is given by choosing the coefficients according to (2) seems to 
leave no more to be said. 

However, the formal elegance of the A-series seems to be somewhat misleading. 
Even when a series converges it by no means follows that its Nth partial sum is 
the best selection of N terms for representing a given function. Even though the 
partial sums do give the best fit in the sense of (3), it may not be desirable to 
measure the closeness of approximation by (8); some other measure of approxi- 
mation may be better suited to the end in view. For example, it is known that 
the partial sums of Edgeworth’s series (see [8]), which is a rearrangement of the 
A-series, are more satisfactory for some purposes than the partial sums of the 
A-series with the coefficients determined by (2). More precisely, Edgeworth’s 
series furnishes an asymptotic expansion, with a remainder term whose order of 
magnitude can be estimated quite precisely, in circumstances where the series of 
orthogonal functions does not do this. Again, for practical purposes a few terms 
of the A-series sometimes exhibit undesirable properties (such as negative 
frequencies). If f(x) is a function defined only for integral values of x, A. Fisher 
[10] has suggested and applied the idea of minimizing, not (3), but the sum 
>-*~ | f(x) — F»(x)|’ in order to determine the coefficients of the approximating 
sums. 


3. The B-series. We can now see how the status of the B-series resembles or 
differs from that of the A-series. Here we deal principally with a function defined 
for integral values of x; (x) = 0(x) = e°r/x!, AO(x) = O(z) — Ox — 1), 
A*@(z) = A(A*'6(x)) and 6:(z) = A*6(x); (x) is taken to be O for negative 
integral x. We shall refer to this as the discrete case of the B-series. The liter- 
ature of the subject contains a number of rather painful attempts to put the co- 
efficients into usable form, persisting even after the simple formula 


(5) Cn = (1/n!) a (7) (—1)' av" 
had been obtained, where yz, is the nth factorial moment, 
y= Le f(k)ki/(k ~ n)!. 


Formula (5) can be derived, for example, by using orthogonality properties of 
the 6,(z). We have, in fact, that > °2-o 0n()@m(x)/00(x) is 0 or n! \~” according as 
n~ morn = Mm. 

The parameter \ in the B-series is at our disposal, and can for example be 
chosen in such a way as to improve the convergence of the series. For purposes of 
practical curve-fitting, it has been customary to choose \ equal to the mean of 
the distribution f(x), a choice which makes the coefficient c; of A@ equal to zero. 
Charlier also suggested other methods in which c¢ and c2, or G1, C2 and C3 are 
zero [7]. Such choices, of course, may reduce the amount of computation needed 
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to make use of a given number of differences in fitting a curve; aside from this 
consideration their use seems to depend on the belief that one improves the 
convergence of a series by adjusting any available parameters so that as many as 
possible of the initial terms of the series are zero. This belief does not always 
seem to be confirmed by the facts. (In particular, compare columns 2 and 5 of 
Table 1, columns 2 and 4 of Table 2, or columns 2 and 4 of Table 3.) 

The theoretical problem of what f(z) can be represented by convergent 
B-series has been studied by several authors [12, 13, 17, 19, 20, 21, 23, 24, 26, 28]; 
the study by Schmidt [24; see also 25 and 17] gives necessary and sufficient 
conditions for the representation in the case of a nonnegative f(x), so that, at 
least in all cases of interest in statistics, the theoretical problem seems to be 
completely solved. However, one of the purposes of the present paper is to 
reopen this apparently closed problem. 

There is also a continuous version of the B-series, which is suggested by the 
fact that 


(6) 6() = (22) &* [ e *™“ exp (Ae™) du 


reduces to the Poisson function e“d*/z! for positive integral x (and to 0 for 
negative integral x). This form of the B-series has not been much used, and its 
use is subject to suspicion since it has rather peculiar properties. In particular, 
it cannot represent, in any reasonable sense, a positive function f(x) or one which 
is too small as x — © [26, 3]; since the functions which present themselves for 
representation in practice are both positive and small at infinity, the continuous 
case of the B-series looks unpromising for applications. (See also [27a], la.) How- 
ever, it has been applied [15]. 

The purpose of this paper is to describe some results on the B-series which 
have been obtained in a mathematical paper [3], devoted to what we have 
called the theoretical problem; some contributions to the practical problem 
will also be given in the present paper. The starting point of this investigation 
was the question of what happens if one tries to approximate a function, not 
by the partial sums of the series (1), but by some other combination of the 
first N functions 6,(2), when approximation is taken in the sense of (unweighted) 
least-squares. This method of approximation seems well adapted to statistical 
problems, and leads to simpler mathematical work than ordinary point-by-point 
convergence of the partial sums. The B-series itself gives a least squares approxi- 
mation with a weight function 1/6(x). We consider here only the classical B-series, 
when (x) = 0(x) = €°*/z!, O,(x) = A”O(x); the main results are substantially 
the same for rather more general cases [3; see also 14, 25]. In addition, here we 
consider only nonnegative f(x), assumed zero for negative x. Functions which 
need not be zero for negative x are handled easily by generalizing the B-series 
to the form [3] 


(7) f(a) ~ L bev 2) + and" 00), 
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where V denotes the advancing difference: V6(x) = @(x) — O(a + 1); there 
seems to be no particular reason (other than a historical one) for preferring one 


kind of difference to the other. The generalized series (7) might be useful for 
graduating symmetrical probability distributions, although it does not seem to | 


have been considered in the literature (cf. [la]). 


4. Results: practical problem. Our question takes somewhat different forms 


in the two cases which we have described as the practical and the theoretical. | 


In the former, we ask what the coefficients a” should be so that 
© N 2 

(8) D | f(a) — Do af” a* o(z) 
z=0 k=Q 


shall be a minimum, where f(x) is an empirically given function and N is a given 

integer, in general not very large. If N is 0, 1 or 2, that is, if we use 1, 2 or3 

terms, the best choice of the af” in (8) can be calculated without difficulty. 
For N = 0, our question is that of finding the best least-squares fit to f(z) 


py a Poisson distribution af”e~n*/z!; the best choice of a{” is then 


(9) as? = ed fla)r/2! / Jo(2ir), 
where 


Jo(iy) = 1+ y°/(2!)? + y'/(41)? + oe 


(Jo denotes the Bessel function of order 0); on the other hand, the usual formula 
(5) gives the different coefficient 


a 


This, of course, is simpler than (9) to compute, although its use is based on the 
uncritical assumption that the first term of the series (1) is the best one to take 


if only one term is to be used. Charlier [7; see also 10, pp. 101-103] suggested a | 


different formula in which one uses, not A‘@(x), but A*‘é(px + q), the parameters 
P, 7, \ being adjusted to make the terms of (1) in A@, A’6, A°é all zero; here 0(z) 
is defined when z is not an integer by interpreting e*d*/zx! as e “A/T (x + 1), 
and not by using formula (6). Table 2 shows that in at least one numerical case 
(9) gives a better least-squares fit than Charlier’s method (and without intro- 


ducing gamma functions to take care of @(x) for fractional x). However, it is | 


not excluded that Charlier’s method will give better results in other cases, 
since with the change of the functions @,(7) the results of this paper cease to 
apply. 

For N = 1, we get the best least-squares approximation to f(x) by 


af 6(x) + af? Ae(zx) 
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qd) 


a = + Dv), 


8 
a? = — a xi gdh o 


where >» = =) f(x)o(@), a = > 2, f(x)@(a — 1), a = J(2ird), B 
—iJ, (2id), the J’s‘again denoting Bessel functions. For N = 2, the seule 
formulas involve also y = —J2(2i\) and p> > f(x)@(a — 2). They are: 





(10) 


a a 
et = Sista Eat ELSE 

(u) +B ae tay, 
ea = a do + me Da 


bcp Mes By 

ae - "aa © ~ a7] 
The functions 7”J,(iy) are real for real y, and extensive tables are available [32]. 

Some numerical examples showing the comparison between graduation by 

these formulas and by the corresponding number of terms of the B-series are 
given in Tables 1-3. It will be noticed that (as the theory indicates) one gets a 
better least-squares fit by formulas (9), (10) or (11) than by a corresponding 
number of terms of the B-series using the coefficients (5). However, one may not 
get a better fit if goodness of fit is measured in some other way, e.g. by x’. 
Unfortunately the coefficients calculated by this method increase rapidly in 
complexity as the number of terms increases, and even the coefficients for N = 3 
would involve very heavy algebra. Since numerical examples [2] indicate that it 
is often necessary to go to terms in A‘@ for a satisfactory fit, it might be worth 
while to calculate the next few coefficients. 


5. Results: theoretical problem. In the case of a theoretical distribution we 
ask how coefficients should be determined so that 


0 N 2 
(12) Dd | f(a) — do af” a*o(z) 
—) k=0 
will tend to 0 as N — . The convergence to 0 of (12) is a rather strong kind of 


convergence, since it implies convergence of the approximating sums to f(z), 
not only for each xz, but even uniformly for all x. Of course, the “‘best” choice of 
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ai” as above would be expected to give convergence under the weakest hypothe- 
ses, but because of the complexity of these coefficients it seems desirable to 
make (12) only approximately a minimum; this actually makes no difference 
in the limit, although the approximation is not usually satisfactory for small 
values of N. To see the connection between the formulas used here and the 
“classical” formula (5) for the coefficients in (1), we note that (5) can be written 


(13) On <4 - cya £ [* "3 


(5) results if we expand the derivative by Leibniz’s rule and rearrange the sum. 
If we expand ¢™ in a power series before differentiating in (13), we obtain 


=(- rtm S (ee wens er E(1)e oY sm, 


l=max(k,n) k)! if 


If now we break this series off at n = N to obtain 


(14) a = X-d (‘PE sew, 


we obtain a sequence of approximations to f(x) by sums De af” a" 6(x) which 
has, in general, much better convergence properties than the partial sums of the 
B-series with coefficients a, given by (5). In particular, if f(z) = 0 forz = —1, 
—2, --- , this sequence of approximations converges to f(x) whenever > .2 | f(x) P 
converges; on the other hand, for nonnegative f(x) it is known [24] that the 
B-series converges if and only if limz.. f(x)2°z* = 0 for k = 0, 1,2, --- ,a much 
more restrictive condition. If we demand that the partial sums of the B-series 
converge in mean square, that is, that (12) tends to zero with ag” independent of 
N, we have the even more restrictive condition [3] that lim supz.. {f(x)}'* < 3. 

The approximating sums with coefficients (14) have the additional property 
that they reproduce f(x) exactly for x = 0, 1, 2, --- , N. One would expect 
that in general they would then tend to deviate rather widely from f(x) for 
larger x, and so would not be satisfactory for practical curve-fitting. However, 
it seems possible that if we fit such a sum not to f(x), but to f(px + q), with 
suitable integers p and q, thus making the approximation agree with f(z) at a 
set of values covering the whole range of definition of f(x), it might give a satis- 
factory fit elsewhere. This possibility has not been investigated; a similar 
approach using the partial sums of the B-series was suggested by Charlier [7] 
and Fisher [10]. 


6. The continuous case of the B-series. In the continuous case we again ask, 
not when 


(15) fa) = 3 a,A*0(2) 
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with uniform convergence in every finite interval, but when 
N 
(16) f(x) = lim. >> a® a*6(z), 
No n=0 

which means that 
(17) lim [ 

N-o © 
For (15) the following negative results are known [26]: if f(z) = 0, (15) cannot 
converge uniformly on every finite interval (unless f(x) = 0); the series, if 


convergent uniformly on every finite interval, cannot converge to f(x) unless 
the Fourier transform of f(x) vanishes outside (—z, 7), a condition which 





f(x) -— > a A" (x) dx = 0. 


TABLE 1 
Number of petals on buttercups. > = .631 
2 3 4 5 6 
1 Calculated | Calculated | Calculated | Calculated | Calculated 
x Observed | 3 terms 1 term 2 terms 3 terms 3 terms 
frequency | (formula | (formula | (formula | (formula | (formula 
5) 9) 10) 1 14) 
5 133 134.9 119.9 130.6 132.9 133.0 
6 55 51.6 75.6 62.3 55.3 55.0 
7 23 22.5 22.5 13.3 22.1 23.0 
8 7 9.5 5.0 1.5 8.5 9.1 
9 2 2.9 0.8 0.0 2.4 2.6 
10 2 0.6 0.1 0.0 0.5 0.5 
TOM. 6 see 222 222.0 223 .9 207 .7 221.7 223 .2 


automatically excludes any f(x) which vanishes for all large | x | or even is too 
small as x —> «©. Nevertheless, Jgrgensen [15] applies the continuous case success- 
fully to practical problems. A possible explanation of this apparent discrepancy 
is that if the a” in (16) are properly determined, (16) will be true under fairly 
general conditions. To be sure, the mean square difference in (17) cannot be 
made arbitrarily small unless the Fourier transform g(x) of f(z) vanishes outside 
(—zx, mr), but if | f(z) ? is integrable the difference can be made small if g(zx) is 
itself small. If g(x) does vanish outside (—7, 7), then (16) is true; and in fact 
the coefficients a{” can be taken the same as in (14), so that the approximating 
sums depend only on the values of f(x) for integral values of x; these values are 
known to determine f(x) under our hypotheses on g(z). 


7. Discussion of some numerical results. Table 1. Column 2 gives the fit by 
two terms of the B-series (really three, since the coefficient of A@ is zero when 
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formula (5) is used), as calculated by Charlier [7] (that is, using terms through 
A’@). Column 3 gives the best least-squares fit by a single term, i.e., a Poisson 
distribution, calculated by formula (9); it is clear that this term alone does not 
represent the observations very well. Column 4 gives the best least-squares 
fit by terms through A@. Column 5 gives the best least-squares fit by terms 
through A’@; the improvement over Charlier’s fit by the same number of terms 
is evident by inspection. Column 6 gives, for comparison, the same number 
of terms calculated by formula (14), which gives an approximation to the best 
least-squares fit and necessarily reproduces the data exactly for the first three 





TABLE 2 
Failure of grains of barley. » = 2.757 
1 2 3 4 5 
«| gipmnes: | Seema | Seine | reat | Cecio 
ow (Charlier) (Formula 9) | (Formula 10) | (Formula 11) 
0 53 63 47.3 49.9 48.4 
1 131 139 130.4 134.7 133 .4 
2 180 174 179.8 181.6 182.3 
3 170 151 165.3 163.2 164.3 
4 111 111 113.9 110.0 109.8 
5 50 60 62.7 59.3 58.1 
6 22 32 28.8 26.5 25.2 
7 22 14 11.4 10.2 9.3 
8 7 6 3.9 3.4 2.9 
9 2 2 1.1 1.0 0.8 
10 1 0 0.3 0.2 | 0.2 
eet tee wee Se 
: | 749 | 752 744.9 740.0 734.7 





values of x. The fact that (14) gives good results here is presumably connected 
with the small size of X. 

Table 2. Column 2 gives the values calculated by Charlier [7] for a fit after 
the linear transformation x — px + g, with \, p and q chosen to make the terms 
in A@, A’@, A’é all zero (the values were read to the nearest integer from Charlier’s 
graph). Column 3 gives the best least-squares single-term fit calculated by 
formula (9); this is a considerable improvement for x S 6, but for the remainder 
of the table it is rather poor. Column 4 gives the best least-squares fit by two 
terms; column 5, that by three. The x’-test indicates that the graduation is 
rather poor in all cases. 

Table 3. Column 2 gives the classical calculation with terms through A’#; 
this was given by A. Fisher [10] and (more accurately) by Aroian [2]. Columns 3 
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h and 4 give the best least-squares approximations by two and three terms; 
. column 4 is better than column 2, in this sense, as expected. However, column 4 
ot is a poorer fit when tested by x’, chiefly because of the poor fit at x = 0. It should 
2 be noted that two more terms of the B-series give a more satisfactory fit [2]. 
z TABLE 3 
er a-particles from a bar of polonium. X = 3.87155 
. | : | 2 | 3 | 4 
“ t | Sbserved | Storms | “Stems | Storms. 
| (formula 5) | (formula 10) | (formula 11) 
= : : 
0 | 57 49.5 | 51.3 45.2 
— 1 203 | 201.3 213.3 190.9 
2 383 | 403 .4 399.0 393 .5 
' 3 525 532.3 | 524.8 529.8 
1) 4 532 | 520.6 517.2 525.4 
one 5 408 | 402.6 407.7 409.7 
6 273 | 254.8 267 .7 261.9 
7 139 | 137.1 | 150.6 141.1 
8 4 64.0 | 74.1 65.3 
9 27 26.1 | 32.4 26.3 
10 10 9.4 | 12.8 9.3 
11 4 3.0 4.6 2.9 
12 0 0.9 | 1.5 0.8 
13 1 0.2 0.5 0.2 
14 1 0.0 | 0.1 0.0 
Total....... 2608 | 2605.2 | 2657 .6 | 2602.3 
| x?=102 | x? = 162 x? = 114 
— 7 | n= 7 | n=8 | n= 7 
ted | 
8. Proofs: theoretical problem. We now outline the proofs of the results which 
ter we have stated. They depend on the fact that the numbers 6(z) (x = 0, +1, 
ms +2, ---) (where 6(x) = 0 when z is a negative integer) are the Fourier coefficients 
r’s of the function ¢(u) = e* exp (Ae™), i.e. 
by x 
der | 6(z) = (2m) [ o(uje*™ du, x = 0, +1, +2,::- 
WO iis 
| is Furthermore, 
"6; M6(2) = (2ry" e(u)(1 — e™)* eo ™ du. 


is 3 a 
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If we then assume the condition >.%. | f(x) |? < «, with f(z) = 0 for z = 

—1, —2, --- , the numbers f(x) are the Fourier cuiicleeta of a function g(x) 
of anaes square, by the Riesz-Fischer theorem from the theory of Fourier 
series [31, p. 74]: 


f(z) = (2m) [ glue du, z=0,+1,+2,--- 
Thus 


(18) fla) — af” a*o(a) 


= (2) [ eo | 9 — g(u) 2 as’ (1 — | du, 


and so the expressions on the left appear as the Fourier coefficients of the expres- 
sions in square brackets on the right. By Parseval’s theorem for Fourier series 
[31, p. 76], then, we have 


(19) | f(x) - : as” A az) | 


z2=—0 


rT N 2 
~ (2m) | g(u) — e(u) Dak? — e™)* du. 
x | k=0 


Thus we have reduced the problem of minimizing the mean-square difference 
on the left of (19) to that of minimizing the integral on the right of (19). By 
rearranging the sum in the integrand, we see that an equivalent problem is to 
minimize 


(20) D = (2r)" | g(u) — o(u) . ac” ” ih 
where the c{”’ and a{” are readily expressed in terms of each other; in fact, 


(21) a = (1D (1) ef 
l=k u 


—\+Acosu ~ e 2h 


Since | g(u) | = e > 0, we can write D in the form 


@ | NV z 
D= (2x) [ g(u)/e(u) — >> cf” e*™ | | o(u) ? du, 
—r | k=0 | 
so that 


L g(u)/e(u) — > 6 eit | | dha > 2rD 


IV 


cof | glu) /e(u) — Keke} du, 
—T 


k=0 


segue 


du, 
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since ee < _ g(u)| S 1. Thus we can make D arbitrarily small if and only if 
we can make 


9 


(23) D* = (2) [ g(u)/e(u) — > ef e*™! au 
x | k=0 | 


arbitrarily small. Now the Fourier coefficients of g(u) are f(x); those of 1/¢(u) 
are e'(—A)*/x! for « = 0, 0 for x < 0; by the convolution theorem for Fourier 
coefficients [31, p. 90] the nth Fourier coefficient of g(u)/y(u) is 


(24) X f(n — k)e(— r)*/k!, n=0,1, 2,---, 
and zero for n < Q. Furthermore, it is well known from the theory of Fourier 
series that D* is a minimum if c{”? are chosen as the first N + 1 Fourier coeffi- 
cients of g(w)/p(u), and that this minimum is arbitrarily small for large enough N 
if and only if the Fourier coefficients of g(u)/y(u) are zero for negative indices 
—which is in fact the case. If we then take the values (24) for c§”’, k = 0,1, --- , 
N, and express ai”? in terms of c{*’ by (21), we arrive at the formula (14). 

It will be observed that the minimum D is connected with the minimum D* by 


; . . ; — min D no. 
min D S max | ¢(u) | - min D* < min D* s ——— < e min D, 


min | ¢(u) | ~ 


so that all that we can say about the approximation given by (14) with a small NV 
is that it is an upper bound for the best possible mean-square approximation by 
sums (18), and that the best mean-square approximation is at worst e” times 
it. This means that if D* is small, so is D; but D* is not necessarily small even if 
D is. Hence we cannot in general expect the coefficients (14) to be suitable for 
practical curve-fitting, since they may increase the mean-square error by a 
factor of as much as c”; we may, however, expect (14) to be better when \ 
is small. 
Now, as we have already observed, 
s 
f(x) - > af? a* 6(zx) 
k=0 


is the zth Fourier coefficient of 
N 
‘ (N) ii\k 
gw) — ev) Daf? a — e*)'; 
k=0 


if we write (18) in the form 
k=0 


(25) f(x) - > af” A‘ 6(x) = [| ooseco ~ » a; (1 — Jet dt, 


and choose the a{*? as specified above, the expression in square brackets is 
g(t)/e(t) minus the first N + 1 terms of its Fourier series, and so the Fourier 
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series of [---] involves no e"' with k < N + 1. Since the Fourier series of ¢() 
involves no e*’ with k < 0, the product ¢(t)[---] also _—, no e"’ with 
k < N + 1, and therefore the integral in (25) is zero for x = 0,1, 2, --- ,N 
(since it “eneele the «th Fourier coefficient of ¢(t)[---]). In dns words, 


N 
f(x) - at a‘ 6(x) = 0, = 61,2, ---,N 
k=0 F 
Furthermore, we can compute f(x) — Do af A*0(x) forz > N by the convolu- 
tion formula from the Fourier series of g(t) and [---]; for n > N, the nth Fourier 
coefficient of [---] is just that of g(t)/e(t), given by (24), and that of ¢(é) is 
er”/n!, so forz > N 


fe) — Lal aoe) = = (a1 — k)e(- n/t) a(x — 1) 
k=O =N+1 \k=0 


and in particular 


N+1 


f(N +1) - 3 ai” a‘e(N + 1) = a f(N + 1—k)(—~)*/kt. 


9. Proofs: practical problem. We have so far obtained only an estimate for 
the minimum of D, by obtaining the minimum of D*; this estimate is satisfactory 
for large N and so for theoretical purposes. However, to obtain precisely the 
best mean-square approximation to f(x) by a small number N of terms of the 
sum in (18), we have to choose a{*? so that 


N 

d as (1 — e%)* g(t) 

k=0 
is the first N + 1 terms of the expansion of g(t) in terms of the set of functions 
obtained by replacing (1 — e*')“g(t), k = 0, 1, 2, --- , by an equivalent ortho- 
normal set. The process for obtaining this orthonormal set is well known; it 
turns out that the integrals that have to be evaluated are expressible in terms of 
Bessel functions of imaginary argument; the result is that the first orthonormal 
functions are 


Yo(t) = (2m) ta * exp (Ae"’), 


See be et he 
Wilt) = (20)? > 55 exp (re"), 
[ao(a? — a3)] 


(i) = (on) 3 =~ oe _as)e" - (ai = a)e “ exp (de) 

[(a? — a2)(ay — a»)(20? — a2 — avar)|? 
where a = Jo(2iA), ay = —tJ1(22A), ag = —J2(2z2d). It is then a simple matter, 
first to express Yo , v1, v2 in terms of ol), e()U — e''), o(t)(1 — e*')*, and then 


1) 
to determine a§”; a$”, af”; and a”, af”, aS”. For example, the best two-term 
‘ 
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approximation for g(7) in terms of Wo(u), W(x) is 


rT 


glu) & po(u) [ g(u)po(u) du + wil) giuyr(u) du, 


and the integrals [ q(u)b.(w) du are combinations of terms of the form 


= 


(27)7 [ g(uje™ o(u) du; 


these in turn are Fourier coefficients of g(u)g(u) and so are expressible, by the 
Parseval formula, as products of the Fourier coefficients of g(u) (namely, f(n)) 
and of ¢(w) (namely, @(n)). We omit the algebraic work; the results are given in 
formulas (9), (10), (11). 






10. Proofs: continuous case. In the continuous case of our approximation 


problem we assume that | f(x) |° is integrable on (— ©, ©) and look for coeffi- 
cients a{*? that will minimize 


w N 12 
D= | (fle) — Dias” a‘o(z) | dr, 
—7 | k=0 | 


where 













r 


A(x) = (2r)* g(ue*™ du, 


ay 


A‘ 6 (x) 


I 


Tr 

(24) [ g(uje ™(1 — e)* du. 
a 

Let f(x) be the Fourier transform of g(u); we can regard @(x) as the Fourier 

transform of g(w), ¢(u) being defined as zero outside (—7, 7). Then by Parseval’s 

theorem for Fourier transforms we have 


® | N 2 
27D = / | g(t) |) dt + | | g(t) — o(t) Do af (1 — e%)* | at. 
iti>-f —T | r=) 
Clearly, then, D cannot be made arbitrarily small unless g(t) = 0 almost every- 
where outside (—7, 7); and if this condition is satisfied, D reduces to the same 
form which it had in the discrete case—see (19). Thus the problem of mean-square 
approximation in the continuous case reduces, if it can be solved at all, to the 
corresponding problem in the discrete case. 





11. Representation by a series. We consider the representation of a given 
f(z) by the B-series with the classical coefficients (5), but with mean-square 
convergence of the series. Here we assume that f(x) = 0, f(x) = 0 for x = 
—1, —2,---, and }>%» [f(x) > < ©, ask whether we can have 


(26) lim > | f(x) — Dd a, d*0(z) = 0, 
k=0 
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where here the a; do not depend on n (but are not, in principle, required to have 
the form (5)). From our previous discussion this is — to 


lim y git) — (dt) : a,(1 — yt dt = 


u—> 2 


and this implies that 


ar 


lim ja, |? | g(t) | | 


u—> x 


From this it follows easily that 


. a,(1 — 


n=0 


converges for ¢ < z, or in other words that 


H(z) = = a,(1 


n=0 
converges on z = 1 except perhaps for z = —1, and hence converges in 
1 — z! < 2. By analytic continuation it is easy to identify H(z) with F(z)®(z), 
where for |z < 1, 


=> f(n)2" &(z) = >. 6()2" 


i=0 n=0 
Since 1/@(z) has no singular points, F(z) is analytic in. 1 — z, < 2 and hence in 
particular in 0 S x < 3; since F(z) is a power series with nonnegative coefficients, 
it has a denier aid at the positive real point on its circle of convergence 
(30, p. 214], and so it must be analytic at least in | z | < 3. This gives the restric- 
tion lim sup,... f(n)''/" < 4. Nevertheless, as we know, f(a) is represented 
in mean-square by a sequence of sums of terms aj‘? A*@(x) even if we assume 
only that = | f(n) *° converges. 

In the continuous case, if f(x) 2 0 and we have 


(27) tim f fla ~Pavin! dx = ( 


k=0 


we must have g(x) = 0 almost everywhere outside (—7, 7) and then, as we saw 
previously, (26) holds also. Now since f(x) 2 0, g(t) has derivatives of all orders 
if it has derivatives of all orders at ¢ = 0 [29, p. 90] and it is easily seen from 
this that g(¢) is analytic for all real ¢ if it is analytic at ¢ = 0. Now on the one 
hand, unless f(x) = 0, g(t) cannot be analytic for all real ¢ if (as we are supposing) 
g(t) vanishes outside (—z, 7). On the other hand, H(e"‘) = g(t)/g(t) for real 
values of ¢t close to 0 and so, if ¢ is regarded as a complex variable, for complex 
values of ¢ near 0. Since 1/g(¢) is analytic everywhere, g(t) is analytic at ¢ = 0. 
From this contradiction we infer that a nonnegative f(x) can never be represented 
in the form (27), although it may perfectly well be represented by 


lim [ f(z) - > af" a* (x) dx = 0 


no k=0 
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HEURISTIC APPROACH TO THE KOLMOGOROV-SMIRNOV 
* THEOREMS’ 


By J. L. Doos 


University of Illinois 


1. Introduction and summary. Asymptotic theorems on the difference between 
the (empirical) distribution function calculated from a sample and the true 
distribution function governing the sampling process are well known. Simple 
proofs of an elementary nature have been obtained for the basic theorems of 
Komogoroy’ and Smirnov’ by Feller,’ but even these proofs conceal to some 
extent, in their emphasis on elementary methodology, the naturalness of the 
results (qualitatively at least), and their mutual relations. Feller suggested that 
the author publish his own approach (which had also been used by Kac), which 
does not have these disadvantages, although rather deep analysis would be 
necessary for its rigorous justification. The approach is therefore presented (at 
one critical point) as heuristic reasoning which leads to results in investigations of 
this kind, even though the easiest proofs may use entirely different methods. 

No calculations are required to obtain the qualitative results, that is the 
existence of limiting distributions for large samples of various measures of the 
discrepancy between empirical and true distribution functions. The numerical 
evaluation of these limiting distributions requires certain results concerning the 
Brownian movement stochastic process and its relation to other Gaussian 
processes which will be derived in the Appendix. 


2. The problem. Let 2, %:, +--+ be mutually independent random variables 
with a common distribution function F(A), 


F(A) = Pr{x; < XN}. 


In statistical language x7; , --- , 2, form a sample of n drawn from the distribu- 
tion with distribution function F(A). Let v,(A) be the number of these x;’s which 
are < X. According to the strong law of large numbers, for each 


(2.1) lim 7) — 


} 


no 1 


with probability 1. For fixed n v,(A)/» is itself a distribution function (which 
depends on the sample values 2, +--+ ,2n) the empirical distribution function, 
and an elaboration of the argument which led to (2.1) shows that (2.1) is true 


1 Research connected with a probability project at Cornell University under an ONR 
contract. 
* Inst. Ital. Atti., Giorn., Vol. 4 (1933), pp. 83-91. 
© Rec. Math. (Matematiceskii Sbornik), N.S. 6, Vol. 48 (1939), pp. 3-26, Bull. Math. Univ. 
Moscou, Vol. 2 (1939), fase. 2. 
‘Annals of Math. Stat., Vol. 19 (1948), pp. 177-189. 
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uniformly in \, with probability 1; that is if 


(2.2 D, = L.v.B. |= — F(a) |, 


—wcrico 86 
then D, is a random variable and 
lim D, = 0 


with probability 1.° This result would be of limited practical statistical importance 
except that the distribution of D, does not depend on the distribution function 
F(\) if F(A) is continuous. In fact in that case the random variables F(z,), 
F (x2), --- are mutually independent and each is uniformly distributed in the 
interval (0, 1); if ¥,(A) is the number of F(z;)’s < \, forj < n, 


La Bu(u) — pi = L.U.B. yn) — F(r). 
<yp<l n —w<h\<w nr 

Thus it is no restriction, replacing x; by F(x;) if necessary, in finding the distri- 

bution of D, to assume that F(A) = \ for0 < A < 1, and 

(2.2’) D, = L.v.B. |= ~ |, 


O<A<1 | 


The results will hold for D, defined by (2.2) for any continuous F(A). We shall 
also consider D;, and D,, , defined by 


pt = LUB. _ - a], 





(2 3) 0<\<1 ? 
Dz = —G.L.B. es ~ a], 
Os\A<1 nv 


and again the results will hold (with the obvious definitions of D; and Dj, in the 
general case) for every continous F(A). 

The problem is to find the limiting distributions of (properly normalized) 
D,,D,,D, whenn—- ~. 


3. Derivation of the Kolmogorov and Smirnov theorems. Define 
° 1 nll) 
z,(t) =n} = = ‘), @< 1 < bh 
” 


Since »,(0) = O with probability 1 and »,(t) — »v,(s) is the number of suc- 
cesses in independent trials, with probability ¢ — s of success in each trial, 
v,(t) — vn(s) has expectation n(t — s) and variance n(t — s) {1 — (t — s)]. Hence 

E\z,(t)} = 0, Osis] 


E{{z,(t) — z,(s)F} = @-s) [1 -(¢-—s)], OS s<t <1. 


(3.1) 


5 Cf. M. Fréchet, Généralités sur les probabilités. Variables aléatoires, Paris, 1937, pp. 
260-261. 
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Now let {x(é)} be a one parameter family of random variables, 0 < t < 1 
with the following properties: 


(a) foreachjif0 < t, < --- <t; < 1 the j-variate distribution of the random 


variables x(t), --- , x(t;) is Gaussian; 
(b) (3.1) holds, that is 
(3.1’) Ej{x(t)} = 0, 0<t<1; 
E{(x(t) — 2(s)} = ¢@-—s)11-@-—s)], O<Ss<t<1. 


(c) Pr{x(0O) = 0} = 1. 

According to the central limit theorem, the j variate distribution of 
tn(ti), °° + ,&n(t;) is asymptotically that of x(t), --- ,a(t;);in fact the normalizing 
factor 7? in the definition of x,(t) and the choice of means and variances in (3.1’) 
were made precisely to bring this about. As far as first and second moments are 
concerned the x,(t) and x(t) processes are identical; when n — < the distribu- 
tions, or at least the j variate ones mentioned, become identical also. 

We shall assume, until a contradiction frustrates our devotion to heuristic 
reasoning, that in calculating asymptotic x,(t) process distributions when n. > « 
we may simply replace the x,(t) processes by the x(t) process. It is clear that this 
cannot be done in all possible situations, but let the reader who has never used 
this sort of reasoning exhibit the first counter example. 

The z(t) process has continuous sample functions (cf. Appendix). Define 

D = Max |z2(f) |, 


O<t<I] 


D* = Max 2(i), 


O<t<1 


D = —Min z(t). 
Ost<1 
Then in accordance with our substitution principle n'D,, n'D>, niD> have as n 
becomes infinite the distributions of D, D*, D™ respectively. (The latter two 
are the same because the —.x(¢) process is stochastically identical with the x(¢) 
process.) Thus these simple qualitative considerations have led to the existence 
of the limiting distributions derived and evaluated by Kolmogorov, who proved: 
THEOREM’ (Kolmogorov). 


x 
(3.2) lim Pr{niD,, > r} =? 7 (—9y"" ou. 
nx 1 
(3.3) lim Prin*Di > r} = lim PrinD,, >rxrAj = oe 
n—x n—0 


To complete our treatment we shall prove in the Appendix that 


ae? PriD >} = 2 (-)er™; 
1 





‘In Feller’s paper (loc. cit., p. 178, equation (1.4)) the factor 2 in the exponent was 
omitted by the printer. The same misprint occurs in Smirnov’s table of the values of the 
series in our (3.2), Annals of Math. Stat., Vol. 19 (1948), pp. 279-281. 


mee 
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(3.3') Pr{D* >} = Pr{D >} = e™ 


> 


so that in fact the above considerations have led not only to the existence but 
to the evaluation of the asymptotic distributions. (Actually we shall prove 
somewhat more general results about the x(t) process.) 

So much for the Kolmogorov theorems. Smirnov obtained results (also 
independent of the given continuous distribution function F(A)) of a somewhat 
different nature. Let 27, x2, --- be mutually independent random variables 
with the same individual distributions as the .x;’s, that is each distributed 
uniformly in the interval (0, 1); define v2x(A) as the number of the first n x;’s 
which are < \. Smirnov considered the difference between empirical distribution 
functions, 


Dan = L.U.B, |’ — vn(d) | 
me osasi | m n 


’ 


as well as Din and Dyn» defined in the obvious way. To avoid stressing the 
obvious ‘we consider only the Dinn . 


a : m , 
THEOREM (Smirnov). Jf m,n — & in such a way thal oe and if 


N = mn/(m + n), 


(3.4) lim Pr{N*D,n >} = 2 > (—1)""' em, 
1 


nx 


To derive this result define an x*(t) process stochastically identical with the 
a(t) process but independent of it. Then if x(t) is defined by 


* 
Bie 
x2(t) = n} (20 = ‘), 
n 
we identify, in accordance with our heuristic principle the process with variables 


{a(t) — rix*(t)} 


with the one with variables 


{2x09 — (Y" 2800} 


Doing this leads to the fact that the distribution of 


. ' 1/2 | 1/2 | 
(N)? Din = ( a ) L.U.B. | am(t) — @) x*(t) | 
m+n 0<t<1 | n 


converges to that of 








1/2 
(Fs) Bag le - 020. 


Nc 


are 
bu 
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Now the x(t) process and the process with variables 

fa(t) — (r)* x*(1) 

ld + ri? 
are stochastically identical. Hence we are led to the conclusion that the distri- 
bution of (N)'*Dmn converges to that of D, and this is Smirnov’s theorem, 
stated above. (The method we use does not seem applicable to Smirnov’s deeper 


theorems on the number of intersections between empirical and true distribution 
curves or between pairs of empirical distribution curves.) 


APPENDIX 
4. The Brownian movement process. Consider any Gaussian stochastic 
process, with random variables {x(t)} where ¢ varies in some interval. That 
is, we assume that for each ¢ in the interval x(t) is a random variable and that 
for any 7 > lift; < --- < ¢; are in the interval the 7 variate distribution of 
z(t;), --- , x(t;) is Gaussian. In the following we shall always assume that 
E}x(t)} = 0. Then the process is determined stochastically by the covariance 
function 
r(s, t) = Ef{x(s)zx(t)}. 
In particular, if the range of parameter is the interval [0, ©) and if 
r(s, 1) = o Min (s, 4), 0O<s,t< a, 
the process is called the Brownian movement process, or sometimes the Wiener 
process; o is a positive constant. When considering this process we shall write 
¢(t) instead of x(t). For the ¢(t) process 
Pr{s(0) = 0} 
Et{s() — ¢(s)F} 
and if 0 < 3; < t; < s < t. the increments 2x(t:) — 2(s:) and 2r(tz) — x(se) are 
mutually independent. We shall use the following properties of this process, of 
which the first two are well known. 


(a) The sample functions are everywhere continuous with probability 1. In 


the following we can therefore write as if all the sample curves were continuous. 
(b) For fixed s 


(4.1)  Pr{ Max [¢(s + t) — ¢(s)] > A} = 2Prig(s + T) — ¢(s) > A}. 


0<tsT 


I, 


2 | | 
git = ¢éi, 


I 


(Note that the use of a general initial value s, rather than 0, has not added 
to the generality and we drop this affectation below.) 

(ec) fia > 0,6 >0,a > 0,8 > 0, then 
(4.2) Pr{ L.U.B. [¢() — (at + b)] > 0} = & ml” 


? 
a< t<ax 


7 Due to Bachelier; cf. the proof by P. Lévy, Comp. Math., Vol. 7 (1939), p. 293. One way 
to prove (a) is to prove (4.1) first, with L.U.B. instead of Max, and then use it to calculate 
the probabilities relevant to (a). 





398 J. L. DOOB 
(4.3) Pr{L.U.B. [¢<() — (at +b)] >0 or G.L.B. [¢() + at + 6] < 0}, 
s<t<o 


2% 
a 7 { 7 2lm2ab+(m—1) 8a8+m(m—1) (aB+ab)] 


m=} 


oo — 1)2ab + m2a8 + m(m — 1)(aB + abd)] 
— go thei (ab + a8) + m(m— 1)a8 + m(m + 1) abd] 


—g timid + a8) + m(m + 1)a8 + m(m — 1) abl}. 
, 


in"particular (2 = a, 8 = b) 


oO 


, T | = m+1 —2m2ab. 
a3) PRB Gey 2 tp = 2 oe 


The probability in (4.2) is the probability that a ¢(¢) sample curve will ever 
reach the line with slope a and ordinate intercept 6; the probability in (4.3) is 
the probability that a sample curve will ever reach either of the indicated 
halflines, one above and one below the ¢ axis. Since the right hand sides are 
continuous functions of a, b, a, 8 we could write >0O instead of >0 and <0 
instead of <0 on the left, so that these probabilities are also the probabilities 
that a sample curve will ever rise above the indicated line or leave the indicated 
angle. 

It will be convenient to describe a line by its slope and ordinate intercept; 
the line [u, v] is the line with slope u and ordinate intercept v. We shall take 
¢ = | in the proof; this is no essential restriction since ¢(t)/¢ is the random 
variable of a process of the same type whose ¢ is 1. 

To prove (4.2) let g(a, b) be the probability on the left, the probability that a 
sample curve will reach the line fa, bj. If b = b, + b., b; > 0, a sample curve 
which is to reach fa, b] must first reach [a, };] and then move up to meet a line 
with slope a, b. units above the first meeting with [a, b,|. Then 


g(a, bi + be) = g(a, bi) g(a, be). 


Now g(a, b) > Pr{¢(1) > a+b} > O and ¢(a, b) is monotone non-increasing in b, 
for fixed a. The only solution of the functional equation with these properties is 
g(a, b) = 
Now g(a, b) is the probability of reaching [0, b] at some first time s and then 
going on to the line [a, b] which from the vantage point of the first common point 

(s, ¢(s)) is the line [a, as]. In other words, using (4.1) 


ov anh om -[-« —y(a)as ds Pr { Max c( > b} 


O<t<s 


—(b2)/28 
—~wia)as Oe a 

r e™ ds 
0 


Qr)? 
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9 ~ . 2 
=f exp | -s - b +0) ds 
TT” Jo as” 


ae 


from which it follows that Y(a) = 2a, and this yields (4.2). 

To prove (4.3) we consider first the following general problem: Let [uw , vl, 
[uo , ve], --° , uj; > 0,v; > O be a sequence of lines; let ¢ = 4; be the first value of ¢, 
if any, at which a sample curve meets [tt , v1]; if 4 is defined for a sample curve 
let t2 be the first value of ¢ > ¢, , if any at which the curve meets [—we2 , —v2]; if 
t. is defined for a sample curve, let ¢; be the first value of t > &, if any, at which 
the curve meets [v3 , v3], and so on. Let z, be the probability that there is a point 
t, , in other words the probability that a sample curve meets the lines [wz , 1], 
[—w2, — wm], --- [(—1)" un, (—1)"*"2,] in at least n successive points. We write 

Hn = n(l1, U1, °°? , Un, Un)e 


In particular, according to (4.2) 


(4.4) m(u, v1) = eo", 


To evaluate z,, let Q be the point (tn1, ¢(tn-1)) on the sample curve, and 
suppose for definiteness that n is even. Starting at Q, if there is a t, , the curve 
must finally reach [—wu, , —v,], that is it must go to a line of slope —w, , which is 
Un—ttn—1 + Un-1 + Untn-1 + Un units vertically below its initial position Q when 
t = tn, . According to (4.2) the probability of doing this is 


eo ianat tno1ttna1 +Uq linet) 


Now we replace the line [—u, , —v,]| by a line which depends on ¢,_, but which 
leaves this probability unchanged; the new line has slope — (n+ + wun) and is 


h=— “ — (Un-1bnaa + Unt + Untn-a + Un) 
Un—y Tt Un 
units below Q when ¢ = ¢,_; . Finally we reflect this new line in the line parallel 
to the ¢ axis through Q. These two changes do not affect the probability we are 
discussing because the changes of ¢(¢) after ¢,1 are independent of the changes 
before and have symmetric distributions. The final line has slope wna + vn1 
and is h units above Q when ¢ = ¢,_; ; it is the line 


Un—1Un-1 + UnUn + 2UnVn-1 
Una Un —§_ ————— 


Un—1 Un 


which does not depend on ¢,-; . This line lies above [tn-1 , Uns] in the first quad- 
rant, so that if a sample curve reaches it the curve must also intersect [un-1 , Un—a). 
We have thus proved that 


(4.5) Tr (U » Oh 5 oo? 5 Mas Un) 


‘ : ‘ Un—1 Un-1 + UnUn + 2Un Un-1 
= Tai %, 01 5 °°° 5 Un=-2 > Un—2 5 Wa a Uy. ——_—__——— ] . 


Un-1 + Un 
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The fundamental identity (4.5) makes it possible to reduce the evaluation of 
wT, tO 7 in n — 1 steps; 7m is evaluated in (4.4). Thus successive meetings with n 
lines have been reduced to a meeting with a single line. As a first example suppose 


Uy = +++ = Un = U, y= =U =e. 


Then we have 


Tn(U, Ve °°° 5 U,V) = Ta(U, Vv; -°* 3 Qu, Qv) = oe> 
= m(nu, nv), 
so that 
(4.6) rr(U, 0; +++ 3u,v) =e", 
More generally suppose 
Uy = Uz = °° = 4, 1 = 03 = +--+ = Ob, 
Up = UW = +--+ =a, Vo = %4 = --+ = B. 


Then we show that for suitably chosen C$”’s we have according as n is even | 
or odd 
(n) > (n) y(n) y(n) 
nr Cy"ab + Cz aB + C3"'a8 + Ce'ab 
mn(a,b; +++ ;a,8) = m| 5 (a +a), aan ; 


(a + a) 





wis 


mn(a, b; +++ 3a, b) 
(4.7) 


n+ 1 n—1 Cy"ab+ Cy” aB + C$’ aB + Cy”’ad 
: a ; 
> 


») 
“ “ n 1 n+ lea 
+ “+ 


For n = 1 this form is correct with 
y(1) yl v1 v1 
4) = 1, Cy) = Cy’ = Cy’ = 0. 


If now v is even and if the equations are true for n, 


\ CoB + Cy” ab + Cz"ab + Cy? a8 








n 
Trii(a, b; +++;a,b) = me fa, b; z & + @), 
2 n 
a a — a 
n+2 n abtcm + CO ad + CHrad + CLaB + nla + ab 
= ae ae a + 3 See a + = a ne ae 





n 
9 atse 


and comparing this with (4.7) we find that 


Y 1 ) 
cr = Co” +2 +1, 
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cyt) = ci. 

ce) = Cf, 

ce = oO +n, 
If n is odd we find similarly that 

ce - cy" + n, 

cy a cy". 

go ‘ass cy", 


com =cm +n+1. 


(n even). 


The solution of these equations is 








n even n odd 
C a}... n co” _ (n + 1)° 
1 — Se 1 ee 
+ 4 
n) n (n) (n — 1)° 
ce «=. CH a Bm 
4 4 
9 
oo” n(n ai 2) op dion. 
SS ee CG; = 
+ 4 
y(n) n(n + 2) y(n) ~~] 
Cy SS Cy = 
4 4 
Then 
= e ilntab + n2a8B + n(n — 2)a8 + n(n + 2) ab) (n even), 
(4.8) j ? 
. —} 1)2ab + — 1)2 2— 1)a8 + (n2 — 1) ab] 
mm =e i[(nm + 1)2a (n )*aB + (n a a (n odd). 


We can now prove (4.3). In fact the left side is equal to 
™(a, b) > T™(a, 8) re 72(a, b; Qa, 8) poe. T2(ar, 8; a, b) 5 de 


which gives (4.3), on substituting (4.8). Only (4.3’), which follows from the 
simple (4.6), is used in the application to the Kolmogorov-Smirnov theorems. 


5. Transformations of Gaussian processes to the Brownian movement process. 
The ¢(t) process studied in section 4 is so simple that it is important to be able to 
reduce others to it by elementary changes of variable. For example if the co- 
variance function of a Gaussian process has the form 


(5.1) r(s, t) = u(s)v(E), s<t, 
for s, t in some interval, and if the ratio 

At 

u(t) _ a(t) 


=” 
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is continuous and monotone increasing, with inverse function a(t). We define 


ulai(t)] 
bad t = ee 
60 = ax” 
With this definition the ¢ process is Gaussian and since if s < ¢ 
er [ai(s) ela] : 
Eig(e(Q} = SaSear”’ = ala(s)] = s = Min(s, d), 


the ¢ process is the Brownian movement process with ¢ = 1. This transformation 
from the x to the ¢ process is effected by a combination of a change of variable 
in t and the application of a variable scaling factor. (Conversely, if such a trans- 
formation is applied to the Brownian movement process it is trivial to verify 
that the new covariance function will have the form (5.1). The Gaussian processes 
with covariance functions of this form are easily seen to be the Gaussian Markov 
processes. ) 


6. The Gaussian process with r(s, t) = s(1 — t). In section 3 the Kolmogorov- 
Smirnov theorems were reduced to properties of a Gaussian process with parem- 
eter t,0 < ¢ < 1, for which 


Priz(0) = 0} = 
E{x(t)} = 0; 
E\{x(t) — e(s)P} = t —sl —(t-—s)], O<s<t<1. 
Now these equations imply that 


Ejx(t)’} = t(1 — 0), E{x(s) 


9 


} = (1 — 5), 
and combining the set we find that 
r(s, t) = Ef{x(s)z(t)} = s(1 — 0), 0O<s<t<l. 


This covariance function has the form studied in section 5, and using the trans- 
formation of that section 


= -@+02(-4), 0<t< », 


defines a Brownian movement process (with ¢ = 1). Then if D, D’, Dare 
defined as in section 3, we have from (4.3’) 


f a oe 
Pri{D > vr} = Pr<L. U. B. | £O_ > a} > (rer 
\ oxtcowo |?4+ 1 J : 


and from (4.2) 
PriD* >} = Pri{D>yAj =e. 
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This proves (3.2’) and (3.3’). Note that we could go beyond these results, because 
of our detailed knowledge of the x(t) process. For example we can evaluate 


lim Pr{(n)*D7 <1, = (n) DE < do}. 


no 


If A: = Ao = A the probability is the probability that (n)'”D, < \ which we have 
already treated. In general it is, in the limit, 


Pr{Min 2(f) => — A, Max a(t) < Ad} 
0<t<1 


0<t<1 
N 
() (1) | 
= Pricip. $9 > —~y, us. $0 <y,! 
O<t<a t+ ‘1 O<t<a t+1 ) 
00 
— Zz to “2 [m2n? +(m— —1)?AF +2m(m—1)d 1X2] + ¢ —2[(m—1)2 *A2 +m? 2nt +2m(m—1)d iho] 
at ' 
m=) 
ae eo almeat +h2) +m(m —1)A1A9 +m(m+1)A 1X9] en Rlmt StS) +en(m+1)A Ae +e(m—1)d1A2] ) 
2 — 2 j 
oo 
at jg ete 4, ee a De 2m Or the)? y 
oe ( & ) 
m=) 


obtained_by setting a = b = dx, a = B = dy in (4.3). 








PEARSONIAN CORRELATION COEFFICIENTS ASSOCIATED WITH 
LEAST SQUARES THEORY 


By Paut S. Dwyer 
University of Michigan 


1. Introduction and summary. It is well known that the zero-order correlation 
between the predicted value of a variable and the observed value of the variable 
is the multiple correlation. It is also well known that the zero-order correlation 
between the residuals for two different variables, when the prediction is from a 
common set of variables, is the partial correlation. These considerations naturally 
lead to a systematic investigation of all the zero-order correlations involving 
the various variables associated with least squares theory. Such an investigation 
is the purpose of this paper. 

As a result of this study it appears that other zero-order correlations include 
the multiple alienation coefficient, the part correlation coefficient, and certain 
other coefficients which, as far as I am aware, have not been previously defined. 

The paper first examines the case of a single predicted variable and then 
continues with the case in which two or more variables are predicted simultane- 
ously. The paper includes (1) a theoretical development of the different coefti- 
cients and the relations between them, (2) the expression of the formulas in 
determinantal form, (3) a matrix presentation of the material, and (4) an outline 
of the calculational techniques——with illustrations. 

It should be made clear at the start that this paper deals with populations 
(finite or infinite) and not with samples from those populations. The sampling 
distribution of each of the new correlation coefficients defined in this paper 
might well become the subject of a later investigation, but first we need to 
know what these correlation coefficients are. 


2. The case of the single predicted variable. Notation, definitions, and basic 
properties. We suppose that a population consists of N individuals with values 
X1;, X2;,°**, Xx;, Y; for the variables X; , X2,---, X;, Y and that Y is 
linearly predicted from the X ; by the formula 


(1) E = Y — a — aX. — aeXe — +++ — aX, = Y-— Y 
by least squares theory. For the purposes of this paper, we use a concise summa- 


N 
tion notation, =Q, in place of the more formal serial notation 2 Q; which is 


i=l] 


b 
preferable to the frequency notation >, Q.f: and, in the continuous case, 


z=a 
b 
/ Q.f.dx Moreover it is desirable that the scales of X and Y be chosen so as 
a 
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to facilitate the easy determination of the various formulas. If we let 





~ cutee’. , Mek 
VNoy V Noz; 
we have Sa; = Sy’ = 1 with the resulting correlating formula 
(3) Poy = — eels = lay and ps2, = ©2;%;. 
V (22%) (Zy’) 
The transformations (2) when applied to (1) give 
(4) c= a =y— Gitit Bates + him) =y-y 


where the 8’s are standard regression coefficients and e is defined to be Now’ 
V iNOy 
It is to be noted that the values of x;, y, e, and y are all dimensionless. 
The values we wish to correlate are those of X;, Y, E, Y of (1). The zero-order 
correlations involving these are the same as for x;, y, e, y of (4). 


3. Correlations with a single predicted variable. We wish to minimize 
Le’. Differentiating with respect to 6; and equating to zero we get 
(5) Lex; = 0 
from which by multiplication by 6; and summation for 2, 
(6) Sey = 0. 
It follows that 


(7) Zé = Sey — y) = Sey = Vly — yy = Sy — Lyy 


I 
at 

| 
M 

< 

KS 


Using (4) and (7), we get 





na 9 
vw 2 XE OF S, 2 
Le = —, = = =1 — by 
Noy Oy : 
so that 
9 9 
12 Oy — OF 
Ty? — 
(8) Ly = —;—- 
Oy 


This is the conventional definition (from least squares theory) of the multiple 
correlation coefficient, so 


(9) Pyeyoy---my = Pye) = Ly = Lyy. 
Application of (9) to (7) gives 
(10) Se’ =i = Pyiz) - Ky(2) = oe ote Zk 
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. 
where Ky(2) is the multiple : alienation coefficient. We now have =x; = 1, Zy’ = 1, 


Ze” = kya), and zy = py(z), So that we are able to present hides involving 
Xi, y, €, y. We first form the cross products 


(11) Zxry = Pry, 

(12) =re = 0, 

(13) try = Urly + e) = Lry = pry, 

(14) Zye = Zy(y — y) = Ly — Syy = 1 — Sy = Kye, 
(15) Syy = Ly" = py, 

(16) Ley = 0. 


We then have 





BSy.- 
(17) ol iv; a Se : 
7 Pe.2z; = > = 22; %;. 
1 4/(S2%)(Er") 
(18) a 
Pz = a = ai, 
" V (S23 i) (Zy") 
(19) axe 0 
Pre  A/(S22)(Set) 
2ry _ =ry Pry 
(20) iMehm- oo. 
2?)(> y?)  Py(z) Py(z 
It is interesting to note that this is unity in case / = 1 for ~_ Pry = Py». Other- 
wise the absolute value of pz, is larger than that of p,,. For this reason this 


coefficient might be called the multiple augmented correlation cocflicient. 


Ley Ki(z) 
(21) Peay = 7 er eosk. = Ky(z)- 
V (Ze*)(Ly*) Kuz) 
Thus the correlation between y and its residual is the multiple alienation coeffi- 
cient. 
San 
Syy 7 
(22) => = VY >y? = pyz)- 
Py V/ (Sy?)(Z 2 Y Y\ 
Thus, as is well known, the zero-order correlation between observed and pre- 
dicted ¥ is the multiple correlation. 
. Ley 
(23) fey = = 0. 
Vv ‘(Se (sy ) 


4. Notation for the general case. We need to extend the notation and the 
definitions before examining explicit formulas for the more general case of two 
(or more) predicted variables. Suppose that Y; and Y; are the two variables 
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predicted from the same X’s. Then from (+4) we write 


E 
4. 
+4 


7 Vv Noy, = Yi — Bati — Bike — +++ — Bate = Yi — Yi 
(24) . 
43 

™ V/ Noy, = Yj — Bart — Bjete — +** — Bute = Yj — Y;- 


We then have the two sets of normal equations 

(25) zex = 0 ze;x = 0 

so that 

Sey; = 0 Ley: = 0 

(26) : 

Sey; = 0 Sejy; = 0. 

It follows that 

7) See; = Leilyi — yi) = Ves = D(yi — ydys = Zyys — Uy; 
= Zyys — DyYs = =yys — LYys = pis — DY; 

if we use the notation that pi; = pyjy;. 


5. The correlations nang more than one predicted variable. In this case 
the y’s, the e’s and the y’s (as well as the x’s) can have more than one variable 
NX) ai the correlation coefficients we need, in addition to those of section 3, are 


Puinss Pesess Pusujr Pusejs Pujers Puivjr Puivjr Pesvjr ANA py;e;, We need now only the 
summed products 


vy ome =_— oa 
<YiYi = Pysuy = Piiy 


(28) 
See; = pis — DYY; as given in (27), 
(29) Dyes = Syilys — ys) = Sys — Dyys = pis — ys, 
(30) ZYiYi = ~YYis 
(31) Ley; = 0. 
We have then 
Yi Yi 
(32) ——>— = Lyiy;, 
~ V (Sy?) (2y7) 

2eie; Pig — YY; 
(33) my? 

(Xej) (Ze}) K i(2)Kj(z) 


This is the partial correlation coefficient. 


Lyi Yi Lyi Yj 
(34) Puss; = : ‘ai: ite 


V (rye Yi )(Sy?) Pi(a)Pj(z) 
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This coefficient appears to be new. Since it is the correlation of predicted values, 
[ suggest that it be called the predictions correlation coefficient. 


=) ZYil; pis — Tycy; 
(35) Pye; = - Faas ; a —_— =< , 
¥ (Xyi) (Xe } ) Kj (x) 
Ze: Yj a =Yi Yi 
(36) enw ee 
V(z € 1) (Ly) Ki(z) 


The correlations given by (35) and (36) have been defined previously and are 
known as part correlation coefficients [1; 213,497]. 


7 Yi Yj Lyi Yi 
(37) Pui; = V3 - = = = — =. 
(Sy?) (Sy?) Pj(z) 
: Lyi Yi Lyi Yi 
(38) Puy; = : = . 


V (Ey?) (Ey?) Yi 3) (Zy}) Pi(z) 
The correlations of (37) and (38) appear to be new. Each is, in a sense, a generali- 
zation of the multiple correlation coefficient since it becomes the multiple cor- 
relation coefficient when i = j. I suggest that it might be called the cross multiple 
correlation coefficient, since it correlates the actual value of one variable with 
the predicted value of another. 











~ei) ; 
Pes; = aa == = @, 
V (Sei) (Zyi) 
(39) : 
ZYi ej 
Pusey = -=0. 


V (Zy?) (Ze3) 
A summary of definitions and names of Pearsonian correlation coefficients asso- 
ciated with least squares theory is presented in Table [. No name is proposed 
when the coefficient is identically zero. 


6. Relations between the correlations. Many relations exist between the 
correlations defined in earlier sections. Some of the more interesting of these 
are obtained by the elimination of =y,y; from formulas involving this term.Thus 
from (34), (37), and (38) we get 

ZY; = Puvuj;Pi(z)Pi(e) = Puiuj;Pi(z) = PusyjPi(z)s 
and from (33), (35), and (36) we get 
Pij — LYYi = Pece;Ki(a)Kj(z) = Pyye;Ki(z) = Peyy;Ki(z)- 


We then have 


* f 
Pij — Pusu; Pilz) Pj(a | | Peje; Ki(z) Kj(2) 
(40) Piz ~~ Pusu; Pi(z) = { Pye; Kj(2) 
Pij — Puiv; Pilz) ) Peiy; Ki(z) 


where the six members may be equated in all possible ways. 


TT 
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Interesting and simple relations can also be obtained by formation of ratios. 
Thus 














Pec; 1 
(41) — _— so a 
Rese; 1 Peivj Kia) 
Peiy; Kj (z) 
TABLE I 
Definition | Name 
Single predicted variable 
Pz2; | Correlation coefficient of zero order 
en | Correlation coefficient of zero order 
Pre = 0 None 
Psy = Ps *Multiple augmented correlation coefficient 
Pyz 
Pye = Ky(z) Multiple alienation coefficient 
Puy = Pyiz) Multiple correlation coefficient 
Pey = 0 None 
Two or more predicted variables 
Pussy; | Correlation coefficient of zero order 
Pose; Partial correlation coefficient 
Pusu; _ *Predictions correlation coefficient 
Pose; Part correlation coefficient 
Pris; *Cross multiple correlation coefficient 
P eiy; | None 


* Proposed name 
Similarly 
Pyiuj Pj(z) 


The geometric mean of similar coefficients yields such expressions as 


V tense = Pee; V ki(2) Kj(z) 
(43) 





V Puss; Puiyj = Puiu; V Piz Pj (2) 


7. Determinantal formulas. The implicit normal equations (5) become when 
expanded 


Pudi + prxBe + +++ + pub. = pry 
(44) papi + pr23_ + pork: = Poy 
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. 9 9 
while Zyy = Ly = py z) becomes 


(45) pur + py + +++ + pyibe = pice 


Let A be the determinant of the matrix of the solution of the I: x's and y. Let A’ 
be the corresponding determinant with p,, replac ed by Pua): Let Ay be the 
determinant of the correlation matrix of the i 2’s. Then py) = sy = Zyy can 
be expressed as a function of A and A,,. If (44) and (45) are to hold simultane- 
ously, then A’ = 0. Expanding A’ in terms of the bottom row, we get 


(46) A’ = 0 = pia Ay + “terms”. 
Similarly 
(47) A = py Ay + “terms” 


where the ”’terms‘ of (46) and (47) are identical. It follows by subtraction that 
A = (1 — pyz)) Ay, and hence that 


. 2 A 
(48) Lyy = Ly = Pete) =-l|-— —. 
Ay 
Then 
9 9 9 ~ 9° A A 
(49) Le = Ley = kya =l— ty =1- (: _ a) = —., 
- wy Avy 
Correlation formulas of section 3 then appear as 
2 Px 
(50) ey = eee 
,-A 
a on 
a 
a1 Se = = 9 
(5 ) Pey rel 





, dell tn 
(52) Pyy = / a 


In a similar way the normal equations (25) become two sets of normal equations. 
The first set is like (44) with 8, replaced by 8,;, and p,, replaced by p,,,. The 
second set is similar with 7 replaced by j. It is desired to find 


(53) LYyYi = LYYsi = PyjrGi + pyj2Be + +++ + pyr. 


Now using (53) with (51) as applied to y; and using the technique of the first 
part of this section, we get 


= bibidiaaanD? 

(54) Ayn; = Puiu; Avivi-vjv; + “terms”, 
. ce ° ave 
(55) O = LywAyivi-vjy; + “terms”, 


where A is the determinant of the matrix of the correlations of the /: x’s, y; and 
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Yj; Aviv; Is the determinant obtained by deleting the column involving correla- 
tions of y,; and the row involving correlations of yi; Ay;y;.y,,; is the determinant 
of the matrix of the k x’s; and the “terms” in (54) and (55) are identical. It 
follows that 


5 Ag iui 

(56) Lyi Yi = pij — ssiiniatalea la 
Ayivi-vini 

and thence 

a Vid; 

(57) py — Lysy;s = ML. 
Avivi-viu; 


The formulas of section (5) then appear in determinant form as follows 





Ai 
(58) sak icra liens cay, seta 
Cc egej y/ (2° \(2*) V/ Mii Bis 
pF Aj. ii 
as is well known. 
Ay 
ij E 
A. 
(59) Puy; = = ii “ 
l =a ) (1 a 22 ) 
V ( iii Ai ii 
Ai 
Aisi 
(60) Pye = : 
PY) / he 
V iia 
ie 
| 
, Ag. 
(61) Pers; = ae = 


San eee 
Vy eed 
Formulas for p-,y; and py;y; are similar to (60) and (61). 


Modern methods of calculating determinants (2), (3), (4), (5) are advised if 
calculations are to be made from those formulas. 


8. Matrix formulas. A matrix presentation is very useful in exhibiting the 
general features of this theory and in developing compact and easy methods 
of calculation with finite populations. The matrix presentation here is similar 
to that given by the author in a previous article [6]. 

Let the normal equations (24) be represented by the matrix equation. 


(62) E=Y-—-XB=Y-Y. 
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Then the sets of normal equations become 
X‘E =0 or X’(¥Y — XB) =0 


so that 


(63) X'XB = XY. 

Now since XB = Y, (63) can be written as X’Y = NX’Y and it can be shown 
that 

(64) Y'Y = YY = Y’'yY. 


But under the assumptions of section 2, Y’X is the matrix of the intercorrela- 
tions of the X’s, X’Y is the matrix of the intercorrelations of the z’s and y’s 
and Y’Y is the matrix of the intercorrelations of the y’s. Hence (63) can be 
written 


(65) RizB = Rey 
so that 
(66) B= RzRy. 


If Y is composed of a single variable, B is a single column matrix (vector) 
but if Y is composed of m variables, B is an m column matrix. It follows at 
once that 


(67) Y’Y = Y'Y = BX'XB = B’R,,B = RYRaRzRa Ry = RyRaRy 
and that 
E'E = (Y — XB)E = Y'E = Y'(Y — XB) = Y’Y — Y'Y 
= Y'Y — y’y = R,, — RRARw. 


It thus appears that the matrix (67) has diagonal terms > y = yy which are 
the squares of the multiple correlation coefficients, and that the non-diagonal 


(68) 


terms are Syy; = Lyiy;. Similarly the matrix (68) has diagonal terms Se = 
2 es : . _ a) ° 
kKyz) and non-diagonal terms LYe,e; = Lewy;. It follows that all the correlation 


coefficients defined above may be calculated from the matrices R.z, Riy, Ryy, 
Y’Y, and E’E. The matrix (67) might be called the multiple correlation matrix 
and the matrix (68) the multiple alienation matrix. 

Conventional results are expressed in terms of the correlation matrices F,., 
R.,, and R,,. All the correlation coefficients defined in this paper may be ex- 
pressed in terms of these matrices and the multiple correlation and alienation 
matrices. 


9. Calculational method of determining the multiple correlation and multiple 
alienation matrices. Various methods might be used in calculating the multiple 
correlation and alienation matrices from the correlation matrices. One method 
utilizes the square root method of solving simultaneous equations, which has 





re 
wl 
m 


wl 


(7 


— 


rr T™]! -—- 


am ss ae = & 


= Te “ae 


5 
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recently been presented in a number of places, [7] [8] together with a device 
which is similar to that used by Aitken [9] in eliminating the back solution. This 
method solves the equation (65) by forming the auxiliary 


> ’ ’ —1 
(69) S..B = S,.ht.2 Rey 
where S,; 1s a triangular matrix such that 


(70) Rez —_ S22Ses = 0). 


TABLE II 





General Illustration 








| Ry | | | 1.000 | .495 
| | — | 1.000 





| 1.000 | .652| .554|) .615] .313 | .650 
| — | 1.000} .747| .693| .280| .803 
Rus ime — | — | 1.000} .774]| .182| .804 
| | — | 1.000] .166| .812 

| 1.000 | .652| .554| .615] .313 | .650 
| 758 | .509 | .385] .100| .500 

ies | SieRre Rey | | .659 | .360] .064| .287 
| | .586] .072]| .199 











| | | 117 | 221 











| YY | | | — | .794 
| | | | .883 | .274 
E'E | | | — | .206 


The right hand side of (69), when premultiplied by its transpose yields 
1) «(40h (SER) = BESSA. HR, = Rak, @ YY. 


Speaking less technically it is only necessary to multiply the columns of 
S,.Rz:R., to get Y’Y. 

A first illustration utilizes the correlations of the Carver anthropometric 
data [10] for 1000 University of Michigan freshmen. This group may be regarded 
as constituting a population, or it may be regarded as a random sample of a 
larger population. For present purposes we regard it as a population. Height 
(Y:) and weight (Y2) are estimated from shoulder girth (X,) chest girth (X2), 
Waist girth (X3), and right thigh girth (X,). The calculation of Y’Y and E’E 
from the correlation matrices follow. 
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As a second illustration I use the correlation between the parts of two forms 
of the Thorndike Intelligence Examination which Lorge has used in illustration 
canonical correlation technique [11, 69-74]. The X’s are the scores on the three 
parts of Form‘A and the Y’s are the scores on the three parts of Form B. In 
this case we designate the results by r’s and k’s (rather than p’s and «’s) since 
the calculation is considered to be for a sample. The calculation of the sample 
multiple correlation and‘multiple alienation matrices is presented in Table II. 


TABLE III 


























Form A | Form B 
X1 X2 | X3 | ; yt | yo | ys | 
stenneacomniamnanaant —" ee | _ — es — a - a 
| | 
| | 1.0000 | .8235 | 7912 
| 
| — | 1.0000 | .8315 | R,, 
;} — | — | 1.0000 
1.0000' .7830) .7852) .8986  .7841 | .S82I7 | 
es — 1.0000 .8393 | 1961 $543 .8254 | Ry 
-— — | 1.0000| .7683  .8226  .8588 | 
1.0000, .7830 .7852| .8986 | .7841 8217 | 
Ses .6220, .3609, .1487 .38ti4 206 | 5,f.ha 
.9032;  .O180 . 1341 21416 
.8299 TO45 7808 
— 7821 7861 | y’yY 
— = S069 
1701 0590 0054 
- .2179 0154 | B’E 
i= — 199] 
10. The numerical values of the coefficients. Tire diagonal entries of 
multiple correlation matrix give the values of Lyi = Lyiyi = pyiz) While the 


non-diagonal values are Zyi yy = Lyi Yi. 
alienation matrix are Se; = Ley; = ky) While the non-diagonal entries 
then able to write out 


wee; = Ze; = 


mY s€ 5. 


Pi(z) = 
Pa = 


Ki(z) = 


We are 
easily. Thus from Table IT 


V/X1 


Vv 
Vv 


| «| 
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= 14/117 = 
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kn) = V3e2 = V7.206 = 454, 
a = 643 
~ 4/(Se2)( De?) V/(.883)(.206) 
Lyiye 221 794 
Wp = : = = SF = =, a 
Pe Veayysy VW C1N7)C794) 
Zee2 274 
, oo Se oe ee we 
Puce = V5a vV/206 
Deiee 204 
n= Faun = 291, 
Pures 4/Set vi 883 
= = ,——_ = 248, 
Puiue V>y3 Tai 
Zyy2 221 
= —— = —== = 646. 
Puy V Sy} /.117 6 
TABLE IVa 
General Sin 
Tyyye Turu3 .9489 -9603 
tes Tose | Tein — 9110 .8392| .8644 .8626! .8747 
Syi Lyy» Lys .8299 .7645 .7858 
ry .9917 
l2(2) T yous ‘yous . 8844 . 8889 8751 
Zy2 Lys 1821 1861 
T3(z) .8983 
Dy3 8069 
TABLE IVb 
7 General Illustration 
Peres _— | .3066 = .0298 
hice) Cie 1 Gy | ty 4 Tem 4124) .1431 | .1264) .0131| .0123 
Le; Deree | re,e, 1701 .0590 | .0054 
wail wei — —_ 
Pailin | | 2214 
Kae tu. Pie 1 4668 | .0973 | .1033 
Des Less | .2179 | .0454 
| kacz) | .4394 
Les | .1931 
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It is possible to utilize a scheme of successive division if all these correlations 
are desired when there are more than two predicted variables. By divisions we 
compute in turn piz), Pyiy;, Puiv; 2Nd py,y; from the multiple correlation matrix 
and Kite) Peiy;, Pyiejs Pese; from the multiple alienation matrix for each 7, 7. The 
computational scheme is illustrated in Table IV where the correlations used 
are the sample correlations of Table III. The calculations from the multiple 
correlation matrix are presented in Table [Va and those from the multiple 
alienation matrix in Table IVb. 

In Table IVa the multiple correlation matrix is first entered on the third of 
each three lines. The square root of each diagonal term is then extracted to give 
the multiple correlation coefficients. The value of riz) is then locked in the 
machine as a divisor and it is divided, in turn, into Dipyo, Zyys to get r,,,. and 
Tyy3- Lhen raz) is used as a divisor by division into r,,,, to get Ty,y,, Into Dyys 
to get Ty,,., and into Dyys to get ryy;- Finally rsz) is divided into r,,,, to get 
Tuy, Into Dyys to get ry,y,, Into Ty», to get ry, and into Lyoy3 to get ryoy,. A 
check on these divisions can be made, if desired, by dividing r,,,, by riz) to get 
Tirves Torus BY T(z) to get Py, ANd Tyyy, by P22) to get Tyyy9. 

Table IVb is treated in a similar manner. 

This technique is immediately applicable to the case of many predicted 
variables. 
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INVERSION FORMULAS IN NORMAL VARIABLE MAPPING 


By Joun RiorDAN 
Bell Telephone Laboratories, New Y ork 


1. Summary. The two inversion formulas considered here arise from study of 
G. A. Campbell’s work on the Poisson summation, which is described more fully 
in the introduction and in the main consists of finding a function or mapping of 
a variable connected with the summation in terms of a normal (Gaussian) 
variable g. More generally, this last is a process often called “normalization of 
the variable” and associated with the names of E. A. Cornish and R. A. Fisher. 
The mapping is two-way and the main inversion formula determines co-efficients 
for one way from those for the other, both sets of coefficients being descriptive 
of their mappings. More precisely if xz is a given variable, g a Gaussian variable, 
y a parameter of the mapping, and the two mappings are 


z=gt+ X Ga(g) y"/n!, 


g=axt+ X X,(x)y"/n!, 


the formula expresses G,(x) in terms of X;(x), 7 < n, and vice versa. 
The second formula is more particularly related to the Poisson summation 
and relates coefficients p, = p,(g) and gn = gn(g) in the pair of equations 


oo 
a=c), qe ?"/n! 
0 
eo 
c=a), pra "/n! 
0 


Both formulas, which are necessarily elaborate, are given concise expression 
by the use of the multi-variable polynomials of E. T. Bell. 


2. Introduction. In 1923, in a paper little known in statistical circles, G. A. 
Campbell [2] gave as the basis for his extensive tabulation of the Poisson summa- 
tion an asymptotic series expressing the average a in terms of a normal variable 
g, corresponding to the probability of at least c occurrences, and c itself. That 
is to say, he associated with the Poisson summation 


eo 
P(a,c) = : i e°a’/zx! 
c 


a normal variable g, defined by 


1 Q ,; 
P(a,c) = Von [ e?? dy 
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and inverted the summation (which, as is well known, is equivalent to the in- 
complete Gamma function ratio) to give a series for a in terms of g and c. The 
series, which is carried to 11 terms, starts as follows: 





2 3 - 
age —-1l_ g — lg —3/2 
~ 1 : 1/2 .s i =o ee ‘| 
awe[it gem 4 Fo et 4 Ce + 
If x = (a —c) c” is introduced, this becomes 
g a g - i 4 
tow Canaries ge a wie 
g + _< + 36 + 
and x is seen to be, like g, a standardized variable of mean 0, variance 1. 

It seems to have gone unnoticed that this result includes the x° distribution 
through the transformation: 2a = X*, 2c = n and it has been rediscovered by 
A. M. Peiser [7] (4 terms) and by Goldberg and Levine [4] (6 terms). 

It is possible also to express c in terms of a and g, and a formula of this kind 
with fewer terms which appears in a footnote in Campbell’s paper is as fol- 
lows: 


c~a E — go? + Bot a’ + f+ 29 a”? + | 
a 

Finally there is a third possibility of expressing g in terms of the remaining 
variables, preferably x and c; though unnoticed by Campbell this has since been 
brought to prominence by Cornish and Fisher [3], Hotelling and Frankel [5] 
and Kendall [6]. 

The idea behind the first expansion appears most clearly in the second form 
and is that for c large the variable x behaves nearly like g. The third possibility 
reverses this expansion and gives a function of x and c which behaves like g; 
hence if this function is first evaluated, reference to the normal integral table 
gives an immediate evaluation of the probabilities in question. Put in another 
way, the expansion widens the scope of the normal integral table and for this 
reason has been called ‘‘normalization”’ of the variable (but this term seems pre- 
empted by its use in another sense for orthogonal functions, and has been re- 
placed in the title by normal variable mapping). 

From the point of view of statistical theory, the three expressions are different 
versions of one relationship, which suggests that there should be general rules 
for transforming a series of one type into that of another. The two inversion 
formulas given below supply these rules in what appears to be as compact a form 
as the problem allows. It will be noted that the proofs given suppose convergent 
series, a case which leads to clarity and brevity and is interesting in itself. Ap- 
plied to Campbell’s series, they give the known results so far as the latter go, 
but of course for other asymptotic series they need independent verifications. 


3. First Inversion Formula. This relates coefficients in series like Campbell’s 
first and its reverse as in Cornish and Fisher. More precisely 
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If Gi(g), Go(g) +++ are assigned polynomials and if 
(1) z=9 + 2 Galg)y"/nl, 


defines x in terms of g and a parameter y, then 


(2) g= 2+ 2, Xq(x)y"/n', 

where 

(3) —X,(z) — Y,(aGi(z), aG2(z), ee aG,(x)), 
TABLE 1 


Bell Polynomials Y,, (fg: , fg2 +++ fn) 


Y, = figs ‘ 
Yo = fige + fogi 
Y; = figs + fe(392g1) + fog 
Ys = figs + fal4gag: + 392) + fa(6gogi) + fai. 
Ys = figs + fo(5gag: + 10gsg2) + fs (10gsg1 + 159291) 
+ fa (10gogi) + fogi 
Ye = figs + fo(6gsg: + 15g.g2 + 1093) 
+ fa(15gagi + 60gsgeg1 + 1592) 
+ fi(209391 + 459591) + fs(15gog1) + fegi 
Yr = figz + fo(Tgegi + 21gsge + 35gags) 
+ fa(21gsgi + 105gago1 + 70g3gi + 1059392) 
+ fi(B5gagi + 210gsgogi + 1059291) 
+ fol35gagi + 1O5g2g1) + fo(21gagi) + fagi 
Ys = figs + fo(8g7g1 + 28geg2 + 569593 + 3594) 
+ f;(28gegi + 168gsgog: + 280gsgsg1 + 2109.92 + 2809592) 
+ fa(56gsgi + 420g.g291 + 2809391 + 840g:g2q1 + 10592) 
+ fs(70gagi + 560939293 + 4209392) 
+ fe(S6gsg1 + 2109391) + f:(28g29') + fegi 


Y,, being the multivariable polynomial of E. T. Bell [1], in the variables G(x) to 
G,(x) and the symbolic variable a which is such that 


a’ =a;=(-D)*", D=d/dz, 

with differentiations on all products of G,(x) to G,(x) associated with it in the poly- 
nomial. 

Note the symmetry of x and g, which allows the transformation to go either 
way, the inverse of (3) being 
(4) —Gr(g) = Yn(aXi(g), aX2(g) --- , aXx(9)) 

Table I gives explicit expressions for polynomials Y; to Ys. It will be noted 
that the number of terms in Y, is the number of partitions of n and that f; , the 
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variable replacing a; in the table, is associated with terms corresponding to 
partitions with 7 parts; that is to say, if Y,,; designates such terms 


ie _— z fi Fad 
1 


The verification or extension of the table may be accomplished by the formulas 
and relations given by Bell (l.c.) or more directly by those modifications of Bell 
given by myself in [8]. 

The first few instances of (3), dropping the common variable x for brevity, 
may be read off from Table I (with appropriate changes of notation and inter- 
pretation of a,) as follows: 


-X,=% 
—X; = G — DG) 
—X; = G; —3D(G:G:) + D?(G}) 
—X, = G —4D(G:G,) — 3D(G:) + 6D*(G.Gi) — D*(Gt) 
Applied to Campbell’s first formula in its second form with y = c 
Gi(z) = (x — 1)/8, G(x) = (—6r* — 142” + 32)/270, 
G(x) = (2° — 7x)/18, Gi(x) = (92° + 2562° — 433x)/1680, 


—1/2 
and 


these show e.g. 


a—7r 22° — 1) 27 —77e +2 


18 3 > ns * 
and similarly for the others, resulting in 

X, = -—(2* -— 1)/3 

X2 = (7x° — x)/18 

X; = —(219z* — 14x” — 13)/270 

X, = (3993x° — 152x° + 1192) /1680 





—X, 


These determine a calculation formula for the Poisson summation, which is 
a refinement of the normal approximation. That is to say 


9 


1 g ' 
P(a, c) = ®(g) = Van [ ef? at 





with 
z—1 Tx?—x 21974 — 142? — 13 
9=2—- se * Me 7 1620c+/c 
4 3993x° — 152x° + 1192 _ a 
40320c? 


vn 
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For the ¢-variate, the formula is applied in the reverse direction since Hotelling 
and Frankel supply the first four values of X, , that is, in present notation, the 
series 





tnt. “ 132° + 82° + 3ry° _ 352" + 197° + 2’ — Lory’ 
9 i.” 48 2 64 6 
4 6271a! + 32242! — 1022' — 16802* — 9452 y* | 
3840 24 
The reversed series (obtained by (4)) is 
3 5 3 2 7 5 w= 3 - 3 
gt+g 5g + 16g + 3gy , 39 +199 + lig — ligy 
“rs 9" 48 gt 64 6 
4. 799" + 776g! + 14829" — 1920g° — 9459 y* ag 
3840 24 


The first three terms are checked by Goldberg and Levine (l.c.). 

Another application worth noting is to the formulas of Cornish and Fisher 
which give G,(g) and X,(x) in terms of the relative cumulants of the distribution; 
to save space these are omitted. 

The derivation of the formula may be indicated most easily by Lagrange’s 
formula for the expansion of one function in powers of another in the following 
form’: 

Let C be a contour in the complex z plane enclosing the point z = z, and let 
f(z) and $(z) be analytic on and inside C. Let y be such that | y@(z) | < |z— 2| 
when z is on C, and g be that root of the equation: 


(5) g=2+ yo) 

which lies inside C. Then 

(6) f(g) = 3; |S . {log [2 — x — yp(z)]} dz = f(x) + > Xn (x)y"/n! 
where 

(7) X3(«) = fs [f’(x) (o(2z))"] 


The contour integral in (6) appears, slightly disguised, as a problem in Whit- 
taker and Watson [Modern Analysis, Cambridge, 1920, p. 149]. The evaluation 
(7) is given for completeness, though no use is made of it in this section, the 
derivation proceeding directly from (6). 

First notice that by (1) and (5) 


—yo(g) = » G.(g)y"/n!, 


1 The author owes the suggestion for this to S. O. Rice, who also simplified the derivation 
of the second inversion formula given later. 
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so that the logarithm in (6) may be written 


log (g —2+ x G,(z)y"/n!), 


or 
log (z — x) + log E + >> G,(2)(z - oy y'/n |, 
1 
or 
(8) log (2 — x) + log exp by, 


with b a symbolic variable such that 
° =b = 1 
b” = bn = Grlz)(2 — x)". 
Now if 
(9) log (exp by) = By + Buy’/2! + -«-, 
= exp By, 


B being another symbolic variable, By) = 0, B" = B,,, it follows from equation 


(5) of [8] that 
B,, = [Dy log (exp by)],=0 , D, = d/dy, 
= Y,(Bb, , Bb, +++ Bbn) 
= ie By Yna(br, be, +++ Dn), 
1 

with 8; = (—)' ‘(i — 1)! and Y,,; the part of polynomial Y,, having 7 parts, 
as defined above. Moreover, each factor b; of terms in Y,,.,; contributes 
G.(z)(z — x)” so that 


(11) By = Do Biz — x) "Vn a(Gy(z), Go(z) «++ Ga(z)) 


Then, by (5) 


f(g) 


2 A fro(, 1, + Lew py) a 


= f(x) — a [ f'(2) exp By dz 


= Yrai(G; «+ G, 
f(x) - | f(z) zs > BY: (Gi - ns : G,) dz 


1 (z — x)! 
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=f) -LErf[S es a Yns(Gi(z) +++ Galz))f"(2) de 
- = f(x) — X v p> (—D)** [f’(x) Yn i(G,(x) i G,(x))] 


with D = d/dx. The evaluation in the last line is by the Cauchy formula for 
derivates; the second line is derived by an integration by parts. 
Equation (4) follows from this and the substitution f(g) = 


4. Second Inversion Formula. This gives the interrelations of coefficients of 
series like the two Campbell series mentioned in the introduction. It runs as 





follows: 
If qi(g), q2(g) --+ are given polynomials and v 
(12) oD i a 
defines a in terms of g and a parameter c; then 
(13) a> ~ a. _ 
| 0 
I 
nr | where 
, 
(14) —Pnlg) = Yalan(g), ag2(9), >> 5 agn(9)) 


with @ = am = lj;a’ =a; = (n — 4)(n — 6) --- (n — 21207" 

Equation (14) is formally similar to (3) and by symmetry as before, ¢n(g) is 
readily expressible as a Y, polynomial in p:(g) to p,(g). 

The first five instances of (14), dropping the argument for brevity, are 


“—— - @& 
a 
Og De = QQ q1 
—Ps=% —-FQn tig 
—~~Pr = % 
ds + ¥ (qq: + 29392) — Z (239i + 3q2q1) 


_— 
+ 42 qgi — Hi 
Applied to Campbell’s first series where 
qa(g) = 9 qx(g) = (g° — 7g)/6 
gg) =#@ —1) gag) = (—12g* — 289° + 64)/135 
q:(g) = (369° + 1024g° — 1732g)/1296 
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these show that 








rng) = —g pag) = (g° + 2g)/12 
pg) = (g + 2)/3 ~—pa(g) = (129* + 289° — 64)/135 
ps(g) = (2079 + 2596g° — 61489) /1296 


The proof of (14) is as follows. First, for brevity introduce symbolic variables 
p and q with the usual interpretation p” = p,(g), q” = qn(g) so that (12) and 
(13) read 







a=cexpqe? 





c= aexppa? 






1/y’ changing these to 


me ‘ 1 2 
Now write a = 1/z,c¢ 






y (exp gy)? 
y = 2 (exp pz)? 


x 






and note that 





(15) a’y” = (exp gy) = exp px 


which shows that p, is the coefficient of x”/n! in the expansion in powers of x 
of (exp gy) . Lagrange’s formula gives at once (D = d/dy): 







(16) fy) = LE Dex ay)" les 





so that 


3 


D””[—(exp qy)*"~® Diexp qy)ly=0 


3/5 


(exp gy) = 


2 2 res 
aa D(exp qy) bi 


~ 
-~ 
. 


—*) [D"(exp gy) Jymo 


== 9 
“ 


I 
“Ms ~Ms ~Ms 
S| 8, 
dS 
i 
to — 





a1" 
KE, 






or 


= 2 n . 4(n—2) 
(17) Pra = pee [D" (exp qy) Jum 





= Y,(aqi, age, +** ,a9n) 






with a; as in (14), by equation (5) of [8]. 
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ON THE DETERMINATION OF OPTIMUM PROBABILITIES 
IN SAMPLING 


By Morris H. HANSEN AND WILLIAM N. Hurwitz 


Bureau of the Census 


1. Summary. In a previous paper [2] it was shown that it is sometimes 
profitable to select sampling units with probability proportionate to size of the 
unit. This note indicates a method of determining the probabilities of selection 
which minimize the variance of the sample estimate at a fixed cost. Some ap- 
proximations that have practical applications are given. 


2. Introduction. Neyman has shown that it is possible to reduce the sampling 
variance of an estimate by dividing a population into sub-populations (called 
strata) and varying the proportions of units included in the sample from stratum 
to stratum [1]. His treatment presumed that the units within any stratum would 
be drawn with equal probability. In many practical sampling problems, the use 
of constant probabilities is neither necessary nor desirable. Not only is it possible 
to obtain unbiased or consistent estimates with varying probabilities of selection 
of the sampling units, but also it is possible to reduce the variance of sample 
estimates by appropriate use of this device. 

It has been shown [2] that in a subsampling system, the selection of primary 
units with probabilities proportionate to the number of elements included in the 
primary unit may bring about marked reductions in sampling variances over 
sampling with equal probabilities. In this note, we shall indicate a method of 
determining the optimum probabilities under certain conditions, and also some 
approximations to the optima that have practical applications. 

By optimum probabilities, we mean the set of probabilities of selection that 
will minimize the variance for a fixed cost of obtaining sample results, or alterna- 
tively that will minimize the cost for a fixed sampling error. 


3. Optimum probability with a subsampling system. Consider, for example, 
the simple subsampling system where primary units are first drawn for inclusion 
in the sample and then a sample of elements is drawn from the selected primary 
units. We shall suppose, for simplicity of notation, that the sampling is done with- 
out stratification. The conclusions indicated below will be similar if stratified 
sampling is used, and they will hold even if only one unit is drawn from each 
stratum. Suppose that a population contains M primary units, and that the 
sampling of primary units is to be done with replacement. Sampling with re- 
placement is assumed in order to simplify the mathematics. We wish to estimate 


the ratio 
Ni 


M 
xX Zz Z X ij 


_ i=l j=1 
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where X;; and Y;; are the values of two characteristics of the jth element within 


the ith primary unit, and N; is the number of elements in the ‘th primary unit. 
A consistent estimate of X/Y is given by 


mP;n ja” 
1 1 a fey 7 
(1) =N1 


where 
P; = The probability of selecting the 7th primary unit on a single draw. 
n; = The total number of elements included in the sample from the 7th unit 
if it is drawn. If a particular unit happens to be included in the sample 
more than once the subsampling will be independently carried through 
each time it is drawn. 
m = The total number of primary units included in the sample. 


It will be assumed that a self-weighting sample is to be used, i.e., that although 
the probabilities of selecting primary units will vary, the subsampling rate 

ee ‘ ‘ c P nN; = 
within the 7th selected primary unit, ~~ , will be such that P; a I:. Note that, 
with this condition, /: is the probability that an element will be included in the 
sample by making a single draw of a primary unit, and by carrying out the speci- 
fied subsampling within the selected primary unit. It follows that mkN is the 
expected total number of elements included in a sample of primary units, where 


The method can be extended to cover situations where other conditions are im- 
posed. 

We shall express the variance of r in terms of P; , m, and /:, and also express 
the cost in terms of these same quantities. The optimum values of P; , m, and i 
will then be determined. 

The variance of the sample estimate. To terms of order 1/m of the Taylor ex- 
pansion of a ratio, the sampling variance of the estimate (1) is approximately 


M 72 M v2 ar 
N : 2 N; N; wah... 
(2) ts » P; me 2X P; Nini = 


Oo = 
. mY? 
where 
Ni Ni 
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: a z 
Oi = Giz + Fao — 2% oi 
Ni o 
x (Xi; — Xi)" 
2 
_ ¥.-i 
Ni " 
: x (Yi; Y;)° 
Ciy = = N:—-1 ’ 
N; mA 2 
dX — X)(¥u — Yi) 
COizy = S$ ° 
N;-1 


The cost function. Now suppose that the total cost of the sampling procedure 
involves a fixed cost attached to each primary unit included in the sample, a 
cost of listing the elements within each selected primary unit (this listing may be 
necessary in order to draw a subsample), and a cost of obtaining information from 
each of the elements selected for inclusion in the sample. Under these circum- 
stances the total expected cost of the survey will be: 


M 
(3) C= Cim + Com Z P Nj + C3mkN 
i=l 


where 


C, = The fixed cost per primary unit, 

C, = The cost of listing one element in a selected primary unit and other 
costs that vary with the number of elements to be listed, 

C; = The cost of obtaining the required information from one element in 
the sample, 


M 
>. P.N; = Expected number of elements in the sample per primary unit in 
the sample, 
mk = The over-all sampling ratio, and 
M 
N = >_N; = The total number of elements in the population. 


i=] 
It will be noted that although the values of P; and m may be fixed in advance, 
the number of elements to be listed, Zz. N;, remains a chance variable. It is for 
i=1 
this reason that we consider the expected cost rather than the actual cost. 
The optimum values of P; , m, and k. The values of P; , m, and k which min- 
imize the variance (2) subject to the conditions that: 


M 
C is fixed, ™ P, = k, > P, =1, 
N: im] 


ar 


(6 


wl 


fre 
pr 
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Ci + CN; . 


> N36; 


in! Ci + CN; 


are given by 


(4) P, = 


/$ Nio; 


/ 1=1 

(5) — N 
M N25, , 
2X Ws + C2N; Cs 


C 





n= 


(6) . M . ee 
Cy + C2 0 PiNi + CakN 
i=1 
where 
oa di. 
6; = A; NV," 


Ordinarily 6; will be positive although it will often be found to be negative for 
some 7. For a great many populations, such negative values can be avoided by 
classifying the primary units into size groups or other significant groups and then 
requiring that the probability of selection be P, for every primary unit in the 
a-th group. 

In actual practice, however, in advance of designing a sample one does not 
have the data to compute the optima and uses methods of approximating the 
optimum probabilities. Methods of approximating the optimum probabilities are 
given below. 


4. Some rules for approximating the optimum probabilities. 'm another 
paper [2] considerations were presented from which it follows that 6; tends to 
decrease with increasing size of unit, but seldom as fast as the size of unit in- 
creases. The rate of decrease is often small relative to the increase in N;, and 
empirical data for a number of problems indicate that even the assumption of 
6; being fairly constant with increasing size of unit may not lead one far astray 
from the optimum probabilities. Under this assumption (6; = 6 for all 7) the 
probabilities depend only on N;, Ci, and C2, and lead to the following results: 

(a) When C; > 0 and C, = 0, probability proportionate to size will be the 

optimum. 

(b) When C; = 0 and C, > 0, probability proportionate to the square root 

of the size will be the optimum. 
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If we go to the other extreme (extreme not in terms of mathematically possible 
values but in terms of most practical populations), and assume that 6; decreases 
at the same rate that N; increases, the results would be: 

(a) When C, > 0 and C2, = 0, probability proportionate to the square root 

of the size will be the optimum. 

(b) When C, = 0 and C2 > 0, equal probability will be the optimum. 

The minimum is broad in the neighborhood of the optimum and the results for 
either of these extremes and the values in between often will give results reason- 
ably close to the minimum. This leads to the following useful approximations: 

(a) When C, £P;N;, the expected cost per primary unit of listing and related 
operations, is small in relation to C, , the fixed cost per primary unit, the 
optimum probabilities will be between probability proportionate to size 
and probability proportionate to the square root of size, and either of these 
will be reasonably close to the optimum. 

(b) When C; is small compared to C.2P;N; , the optimum probability will be 
between equal probability and probability proportionate to the square root 
of size, and either of these will be reasonably close to the optimum. 

(ec) When both C; and C.=P;N; are of significant size, i.e., when the costs 
vary substantially both with the number of primary units in the sample 
and the size of the units, then probability proportionate to the square root 
of the size will be a reasonably good approximation to the optimum. 

(d) When units of small size are used and all of the subunits in the selected 
primary units are included in the sample (that is, there is no subsampling) 
equal probability is close to the optimum. It should be noted-that this 
rule does not follow directly from the above analysis based on subsampling, 
but from a separate analysis in which no subsampling is involved. 

For whatever system of probabilities is used, and with the cost function given 

by (3), the optimum value of /: is given by: 


M 7 M 
‘= Noi @ + C2 dP; v,) 


i - = = =—CO j=1 ; 
(ee Nia? SS Nie? 
ow (3 tee — 


which can be approximated, in application, from prior experience or preliminary 
studies. The corresponding optimum value for m is obtained by substitution in 
the cost function. 

The above results should not be accepted, of course, as the optima for every 
cost function or every sampling system. Either past experimental data may be 
available or pilot tests made to determine the cost function and the appropriate 
approximations that should be used in various practical situations. 

An illustration. An illustration may be of interest. A characteristic pub- 
lished for city blocks in the 1940 Census of Housing is the number of dwelling 
units that are in need of major repairs or that lack a private bath. Suppose we 


OPTIMUM PROBABILITIES IN SAMPLING 431 


were sampling to estimate the proportion of the dwelling units having this char- 
acteristic for the Bronx in New York City, at the time of the 1940 Census. Let 
us assume that once we selected a system of probabilities we used the optimum 
numbers of blocks and the optimum sampling ratios appropriate to these proba- 
bilities, that is, the optimum values of k and m. For each of several cost func- 
tions the following Table 1 shows the sampling variances of each system, rela- 











TABLE 1 
Average cost per primary unit of | ros. : ~ 

Unit costs listing poe related operations Variances a to equal 

(CsP;N;) probability 

Probability, Proba- Probability | hee / 

Equal propor- bility Equal propor- bility 
CG; Cy © proba- | tionate propor- | proba- tionate propor- 
bility to square tionate bility to square tionate 

root of size’ tosize root of size; to size 
5 .10 1 13.49 | 21.15 27.63 | 100 92 104 
5 .05 1 6.75 | 10.58 13.82 | 100 | 88 97 
5 .02 1 2.70 4.23 5.53 | 100 83 87 
5 0 1 0 0 0 100 75 73 
2 .10 1 13.49 21.15 27.63 100 96 111 
2 @ 1 6.75 10.58 13.82 100 93 106 
2 @ 1 2.70 4.23 53.53 100 90 97 
2 0 1 0 0 0 100 7 77 
1 .10 1 13.49 21.15 27.63 100 97 114 
1 .05 1 6.75 10.58 13.82 100 96 110 
1 @ } 2.70 4.23 5.53 100 93 103 
1 a | 0) 0 0 100 82 81 
Q0 .10 1 13.49 21.15 27 .63 100 99 117 
0 .05 1 6.75 | 10.58 13.82 100 99 115 
0 .02 1 2.70 | 4.23 5.53 100 99 113 





tive to the variance of sampling with equal probability. It also shows values of 
C.=P;N; for comparison with C; . 

Some of the costs given in the table do not have unreasonable relationships 
in terms of the situations encountered in practice in various types of jobs. The 
comparisons are not affected by the absolute magnitudes of the costs but only 
by their relative magnitudes. The results are consistent with the rough rules of 
thumb given above. It is worth noting that in each of the above instances prob- 


ability proportionate to the square root of the size yields a comparatively low 
variance. 
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5. Sampling with or without replacement. In this paper the sampling with 
varying probabilities was assumed to be carried out with replacement which 
ordinarily would not be advisable in practice. When sampling is done without 
replacement the optimum probabilities and their approximations will be about 
the same as for sampling with replacement in at least those instances where the 
proportion of the population in the sample is small. Further investigation is 
needed for large sampling rates. 


6. Conclusion. In summary, it is not essential and may not be desirable to 
give each element in the population (or stratum) the same chance of being drawn 
in order to avoid bias or to have a consistent estimate. Estimate (1) is a con- 
sistent estimate no matter what probabilities of selection are assigned to these 
units. The use of variable probabilities of selection is another device to be added 
to those already in the literature, such as stratification and efficient methods of 
estimation, which make it possible to achieve the objectives of a sample survey | 
at reduced costs. Reference [2] gives another illustration of reductions in sampling 
variance achieved through the use of varying probabilities in accordance with 
the rules suggested above for approximating the optimum probabilities. 
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A SOLUTION TO THE PROBLEM OF OPTIMUM CLASSIFICATION 


By P. G. Hort anp R. P. PETERSON 


University of California, Los Angeles 


1. Summary. By means of a general theorem, the space of the variables of 
classification is separated into population regions such that the probability 
of a correct classification is maximized. The theorem holds for any number of 
populations and variables but requires a knowledge of population parameters 
and probabilities. A second theorem yields a large sample criterion for deter- 
mining an optimum set of estimates for the unknown parameters. The two 
theorems combine to yield a large sample solution to the problem of how best to 
discriminate between two or more populations. 


2. Introduction. There are essentially two basic problems in discriminant 
analysis. The first problem is to test whether the populations differ, since it 
would be futile to attempt a classification if the populations did not differ. The 
second problem is to find an efficient method for classifying individuals into their 
proper populations. In this paper, an optimum asymptotic solution of the 
second problem will be presented. 


3. Parameters known. Let f; = fi(ti,---, %e),(¢ = 1,---, 7) denote the 
probability density function of population 7 in the region under consideration. 
Let p; > 0, (@ = 1, ---, 17), denote the probability that population 7 will be 
sampled if a single individual is selected at random from that region, and let R 
denote the k dimensional Euclidean variable space. Then the desired theorem 
is the following: 

THEOREM 1. Jf M; denotes the region in R where p.ifi > pif;, 9 = 1, °°: , 7), 
and where pifi > 0, then the set of regions M;, (¢ = 1, -::, 7), in which any 
overlap is assigned to the M; with the smallest index, will maximize the probability 
of a correct classification. 

For the purpose of proving this theorem, consider any other set of non- 
overlapping regions, M; . Since the addition to any of the regions M; of a part 
of R throughout which all the functions f; vanish will not affect the probability 
of a correct classification, there is no loss of generality in assuming that the set of 
regions M; contains the same portion of R as the set of regions M ; does. The rela- 
tionship between the two sets may be expressed by means of the formulas 


(1) M;= My 
j7=1 

and 

(2) M; = 2 Mu, 


where M;; denotes that part of M; which is contained in M;. 
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Since a sample point that falls in the region M; will be judged to have come 
from population 7, the probability of the correct classification of a single random 
sample by means of the set ./; is given by 


(3) Q=n / fidE+-:-:-+D, / f,dE, 
My i, 


where dE = dx,dx2 --- dx, . If Q’ denotes the probability of the correct classifica- 
tion by means of the set M7; , 


Van] pdb+-- +m [fake 
My M; 
In the notation of (1) and (2), these probabilities become 


Q=nf fhaEt+-- +n] 
= M1; = 


Mr 
2 


f, dE 


d 


and 


Q’ = pf fi dE + eee + pe | fy dE. 
5 Mé1 2 Mis 


Now consider the difference Q — Q’. It can be expressed in the form 
p 


Q-Q'= ¥ > |» [ f, dE — p; fi a | 


i=] j=1 ij Mi; 
= z > / [pifi — pif;] dE. 
i=1 j=1 4 Mj; 


Since /,; is contained in WM; and p,f; > pifj, (J = 1, --- , r), holds throughout 
M, , it follows that each of these integrals is non-negative; consequently Q > Q’, 
which proves the theorem. 

This theorem yields a solution to the classification problem only when the f; 
are completely specified and the p; are known. 

It will be observed that this theorem is similar to a generalization of a funda- 
mental lemma in the Neyman-Pearson theory of testing hypotheses [1], and to a 
result by Welch [2]. 

If the basic weight function in Wald’s [3] formulation of the multiple decision 
problem assumes only the values 0 and 1, corresponding to whether or not a 
correct classification is made, it will be found that the set of regions 1; will 
minimize the expected value of the loss in that formulation. 


4. Parameters unknown. Since the p;, as well as the parameters in the f;, 
are assumed to be unknown, Q will be a function of such parameters. Let 6; , -- - 


’ 


9, denote all such parameters, including the p; . Now let a random sample of size n 
be taken from the region under consideration and let 6; , --- , 4, denote a set of 





? 


of 
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estimates of the parameters based on this sample. Since the total sample will 
constitute a sample of size n; from f; , n2 from fe , etc., wheren = ny + --- + 7,, 
the 6’s for f; will be estimated by means of a sample of size n; rather than of size n. 
In the following arguments, it will not be necessary to distinguish between @’s 
which are estimated by different size samples because the arguments will be 
based on the order of terms with respect to the size sample and n; ~ np; with 
probability one. Or, more simply, choose all n; equal. 

Let M; correspond to M; when the parameters are replaced by their sample 
estimates and let Q denote the probability of a correct classification when using 


the regions M; in place of the regions M; . Then, from (3), 


Q-Q= vs fi sae - fete ae, 


Let H = Q — Q. Since the estimates, 6;, are random variables, H will be a 
random variable which is a function of the estimation functions, 6; , as well as 
of the parameters, @;. The desired criterion for determining optimum estimates 
is then given by the following theorem: 

TuroreM 2. If E(6; — 0;)' = O(n”), g > 0, and if in some neighborhood of the 
point 6; = 0;, (¢ = 1, --- , s) the function H is continuous and possesses continuous 
derivatives of the first, second, and third order with respect to the 6; , then 

2 i=1 j= 
where H;; denotes the partial derivative of H with respect to 6; and 6; at the point 
(A, ore » &). 

The proof is similar to the type of proof used by Cramer [4] to obtain an 
expression for the variance of a function of central moments. 

By means of Tchebycheff’s inequality [4], page 182, it follows that 
E(6; — 6,) 


é* 


P(6; a 6,)* = e'] S$ 


From the theorem assumptions, there exists a constant A such that 


= An * 
Pi; — 0)' > | < = 
€ 
This is equivalent to 
ea : An * 
Pl\6; — 6: | > < 


If E, denotes the set of points in sample space where | 6; — 6;| < «, (i = 1,---,8), 
and E, denotes the complementary set, this inequality implies that 


Amo 
(4) PIE) < *4"_. 
€ 
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The expected value of H may be written in the form 
(5) E(H) = / HdP+f[ Hap. 
Ey E2 


Consider the order of the second integral. From (4) and the fact that H is the 
difference of two probabilities, it follows that 


| J —9 
'[ wapP\< / dP = Pik) < *4_. 
| E2 € 


| % Ee 
Consequently (5) becomes 
(6) E(H) = / H dP + O(n"). 
Ey 
Now consider the first integral. From the theorem assumptions, if ¢ is chosen 


sufficiently small, it follows that for any point in the set E,, the function H 
can be expanded in the form 


s . 1 8 8 = " 
H = H() + d (6; — 6) Hi) + x DD 6: - 6:)(8; — 6;)H:;(0) + R, 
1 - 1 1 
where 6 denotes the point (6; , --- , 45), where 


5s 


) 


8 


; »» (; — 6) @; — 05) Oz — 6) Hin(6’), 


\— 


R «= 


_ 


and where 6’ is some point in E, . Since Q reduces to Q when 6 = 6, H(@) = 0. 
Furthermore, since Q denotes the maximum probability of a correct classification, 
H > 0 for all 6; hence H,(@) = 0 and H;,(@) > 0 for all 7. Thus, for any point 
in the set E; , 


H = <= > 6; — 6)(6; — 6) H.;(0) + R. 
= 1 J 


If this expression is substituted in (6), E(H) will become 
1 
2 


a 


(7) BH) => > >) / (6; — 6) (6; — 6) dP + [ R dP + O(n~). 
1 1 Ey Ey 


Consider, first, the order of the Yemainder term. From the continuity assump- 
tion on A; , it follows that H;;, is bounded in E, , say | H;;(6’) | < B; hence 


| ; : | | 
| Ey Ey 
By Schwarz’s inequality, 
[ | (6; ia 6:) (6; vy 6;) (CR = 6x) | dP 
BE} 


4 


< | [, G00%@; oy" aP J 04" aP | 
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Similarly, 


[, @—o0t@, - opt ap <| f, @—oo'ap [ @, -)*aP| 


[ (6, — 0)°dP < i (6, — 0x)* dP I ir} < lf (6. — 0)* ap}. 


Since 


4 
’ 


/ (6; — 0) dP < (6; — 0)‘ dP = O(n"), 
Ey 


E,;t+E2 


the preceding inequalities combine to give 


(8) 





R aP| = O(n **), 


Ey 
Now consider the first integral in (7). It may be written in the form 
o | = 0), — 0) AP = BG. — 0), — 0) - [ @- 030, - 6) ap. 
By Schwarz’s inequality, . 
[ @- 096; - 6) aP| < | [ @- oar [ Go, aP | 
E> E> E> 


Similarly, 


i 





[ (6; — 0)’ dP < | [ (6; — 6,)* ip-PtE). 


If these inequalities are combined and inequality (4) is employed, (9) will 
reduce to 


(10) [ (6; — 6) (6; — 6) dP = E(; — 6) @; — 0;) + O(n"). 


Finally, if (8) and (10) are employed in (7), it will reduce to the result stated 
in the theorem. 

The order of the leading term in E(H) depends upon the nature of the esti- 
mating functions, 6;. In order to insure that this term will be the dominating 
term, and thus rule out pathological situations, only that class of estimating 
functions (estimators) will be considered for which this term will be of lower 
order than that of the remainder term. If the estimators are means or central 
moments, for example, then g = 2. For such estimators the order of the remainder 
term is O(n), whereas the order of the leading term is not higher than O(n™"). 

A set of estimators will be called an optimum set if it maximizes the expected 
value of the probability of a correct classification, or, what is equivalent, if it 
minimizes E(H). Since only large samples are being considered here, it is neces- 
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sary to define optimum in an asymptotic sense. Consider sets of estimators for 
which E(#) is of order O(n 2). For this class of estimators, a set will be called 
asymptotically optimum if it minimizes 

lim n*E(H). 

n> 
Among asymptotically optimum sets of various orders, the set corresponding 
to the highest order would naturally be considered as the best asymptotic set. 
Now from Theorem 2, it readily follows that a set of estimators which minimizes 


(11) > d_ H;; E6; — 6) (6; — 6;) 
1 1 
will be an asymptotically optimum set. 


5. Maximum likelihood estimates. If the estimates 6; are unbiased and uncor- 
related, (11) will reduce to 


(12) > Hive: 
J 


where ¢; = E(; — 6,)° isa function of n as well as of the parameters. Since, from 
the discussion preceding (7), H;; > 0, it follows that (12) will be a minimum when 
the oc; assume their minimum values. Now it is known [4], page 504, that under 
mild restrictions maximum likelihood estimates possess minimum asymptotic 
variances; hence for estimators of the tvpe being considered which also satisfy the 
conditions in [4], the maximum likelihood estimates of the @; will yield an 
asymptotically optimum set of estimates for the classification problem. 


REFERENCES 

[1] J. Neyman ann E.S. Pearson, ‘‘On the problem of the most efficient tests of statistical 
hypotheses,’’ Roy. Soc. Phil. Trans., Vol. 231 (1933), pp. 289-337. 

[2] B. L. Weucu, ‘‘Note on discriminant functions,’’ Biometrika, Vol. 31 (1939), pp. 218-220. 

[3] A. Wap, “Contributions to the theory of statistical estimation and testing hypothe- 
ses,’”’ Annals of Math. Stat., Vol. 10 (1939), pp. 299-304. 

[4] Ht. Cramtér, Mathematical Methods of Statistics, Princeton University Press, 1946, 
pp. 352-356. 


NOTES 


This section is devoted to brief research and expository articles on methology and 
other short items. 
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A GENERALIZATION OF WALD’S FUNDAMENTAL IDENTITY 


By GuNNAR BLOM 


University of Stockholm 


1. Summary. The fundamental identity is generalized to the case of independent 
random variables with non-identical distributions. The conditions for the 
validity of the differentiation of the identity are discussed. The results given in 
[1], [2], and [3] are obtained as special cases. 


2. A property of cumulative sums. Let 2 , 22, --- be an infinite sequence of 
independent random variables, F(z), F2(z), --- their distribution functions (d.f.) 
and ¢;t), go(t), --- their moment-generating functions so that ¢,(t) = E(e"’). 
ay and by are given constants (ay > by, N = 1, 2, ---). n is defined as the 
smallest integer N for which Zy = 2, + --- + zy is 2 ay or S by. 

We first give two lemmas. 

Lema 1. Jf two positive quantities 6 and « can be found such that one at least 
of the following conditions a) and b) are satisfied 

a) P(z, > 6) > efor all v and lim sup ay < ~ 


N-oO 


b) P(z, < —6) > efor all v and lim inf by > —&, 


N-os 
then for any k = 0 


(1) lim N* P(n > N) = 0. 
N-oO 
An inspection of the proof of (4) in [4] shows that this formula holds when the 
conditions of the lemma are satisfied. The lemma follows. 
Lemma | can be generalized as follows. 
Lemma 2. If two positive quantities 6 and ¢« and a sequence ¢;, ¢2, ++: can be 
found such that one at least of the following conditions a) and b) are satisfied 
N 
a) P(z +c, > 54) > efor all v, lim sup dy < ~, lim sup . . < a, 
N-oO N-o i 


b) Plz +c, < —6) > «forall v, 
N 
lim infby > —*,  liminf vc, > —«, 


N-Oo No 1 
then (1) is true. 
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Proor: In case a) we put z, = z, + c,, Zy = = 2, and ay =ay +21. 
The inequality Zy = ay then becomes Zy = ay . As P(z, > 6) > ¢ and lim sup 
N--o 


a,, < ©, Lemma 1 can be applied to the sequence 21, 22, °:- , and thus (1) is 
true. When conditions b) are satisfied, the proof is analogous. 


3. The generalized fundamental identity. In this section we shall consider 
sequences of random variables of the type defined in Lemma 2. We shall prove 
two theorems the first of which is valid for complex values of ¢ and the second 
only for real values of ¢. 

THEOREM 1. Assuming that 

1°. one at least of conditions a) and b) of Lemma 2 is satisfied; 
2°.b S by < ay S a, where a and b are finite; 
3°. for some complex (or real) value of t, ¢,(t) exists for all v and is ¥ 0 and 


lim inf | gi(é)-+ -gw(é) | > 0, 
No 
then 
(2) Efe’"(ei(t) «++ gn(t))"] = 1. 
Proor. Let W,, denote the set of all sequences 2 --- zy in the N-dimensional 
Euclidean space Qy for which n = m (m S N), W, the projection of Wm on Qn 
and W, wn all sequences for which n > N. We have identically 


| > 


m=1 4Wm 


8 
+ / | il dF, oe dF y _ / ie dF; eae dF y = gi(t) st en(t). 
Wn>N Qn 


Dividing by the right member and cancelling common factors we obtain 
N 


—l tZ 
(er-ss em)? |, ef dF, +++ Fn 
m=) Win 


(3) 
+ i saints a [ ot2N dF,++-dFy = 1. 
Wn>N 


When N — ~- the first sum tends to the left member of (2). We thus have to 
investigate the last term in (3) which we denote by Ry . We can write 


Ry = (¢1 ae gx) / ned dF,+-- dF yw 
Wa>N 
= (gi °:: gn) P(n > N)En> ye”. 
It follows from Lemma 2 that P(n > N) — 0. Asb < Zy < a by 2° we conclude 
that Ry — 0. This proves the theorem. 
THEOREM 2. If, for some real value of t, v(t) exists for all v and if quantities 


c,, € > Oand 6 > 0 can be found such that at least one of the following conditions 
a) and b) are satisfied for all v 


(4) 


a) limsupay < ~,limsup >, ¢, < © and 
No 5 


No 


(5a) A,(t, 5) = e” dF ,(z) > €, (y = 1, 2, see), 


oie 
gr(t) 5—c, 
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. N 
b) lim inf by > — », lim inf Doc, > —@ and 


No No 1 


—s—c, 
(5b) B,(t, 8) = [ e" dF (2) > ¢, (vy = 1,2, ---), 
¢r(t) Le 


then (2) holds. 

The conditions of the theorem become more attractive if the theorem is 
limited to the somewhat less general cases mentioned in the Corollary below. 
The above formulation has been chosen mainly because of an important applica- 
tion to identical variables in Sec. 6. 

Proor. The theorem is proved if we can show that Ry in (4) tends to zero when 
N — o . For that purpose we use the transformation (cf [5] and [3]) 


. a ” os 
(6) Glas) = Le dF,(2), fie ER oncd 


G,(z; t) is obviously a d.f. for every real ¢ (for which ¢,(t) exists). When (5a) 
holds, 


Plee + c > | G,(z;t)] = A(t, 8). 


Here the expression in the left member denotes the probability that z, + c, > 4, 
when G, is the d.f. of z, . 

Consequently, when conditions a) are fulfilled, a sequence of random variables 
with the d.f:s Gi(z; t), Go(z; t), --+ or, with one notation, G(t) satisfies the con- 
ditions a) of Lemma 2. It follows that 


lim P(n > N|G(t)) = 0. 
No 
Introducing G,(z; t) in Ry we find 
fa [ dG, --- dGy = P(n > N|G(d). 
Wa>N 


Consequently Ry — 0. When conditions b) are fulfilled, the proof is analogous. 

CoroLiary TO THEOREM 2. If 1° g,(t)e“’ S H(t) < ©, 2° t is positive and 
conditions a) of Lemma 2 hold or t is negative and conditions b) of Lemma 2 hold, 
then the generalized fundamental identity is true. 

For, in the first case 
eter) pe aa ect? _ 
elt) te HO” 
so that (5a) is satisfied, and similarly when ¢ is negative. 

The following special case deserves particular attention as it covers most 
cases occurring in practice and the conditions become very simple: Jf a sequence 
of random variables satisfies conditions a) and b) of Lemma 1 simultaneously, a 
sufficient condition for the validity of (2) for some given real value of t is that the 
sequence ¢,(t) 1s bounded. 


A(t, 6) 2 
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4. Application to Poisson variables. As an application of (2) we consider a 
sequence of Poisson variables with the parameters \m,, where X is a positive 
quantity and m, are positive integers. From the well-known formula 


y(t) ‘eau gimote—2) 
we easily conclude that the conditions of Theorem 1 are valid if R(e") = 1. (With 
5 < 1 in (5a) we find that (2) holds even for negative ¢.) If, in particular, we 
tk: ; 
choose ¢ so that ef = 1 + = = c;,, we have the simple formula 
E(ce") = 1, (k = 1,2, ---). 


5. Differentiation of the generalized fundamental identity. In this section ¢ 
is assumed to be real. We denote the kth derivative of g,(t) by oS (t). We shall 
prove the following theorem which corresponds to Theorems 1 and 2. 

THEOREM 3. If for all t in a closed interval I the conditions stated in Theorems 
a) 
| g(t) | 
with respect to both v and t (in I) for k = 1, 2, --+ r, then the generalized funda- 
mental identity may be differentiated r times with respect to t for any t in the interior 
of I. 

We use a method of proof which is similar to that used in [2]. We first show 
that the sum in (8) may be differentiated r times under the integral signs and 
secondly that the rth derivative of Ry tends to zero uniformly in t when N > ~., 

The rth derivative of the general term of the series in (3) consists of a finite 
number of terms of the form 


1 or 2 are satisfied and tf, in addition, the functions are uniformly bounded 


J,(t) = (gi++: om) HA, , Z> c'2" dF, +++ dBm (u SAsp,A = 1,2,--- 1), 
Wm 

and the rth derivative of Ry in (4) consists of a finite number (which does not 

depend on N) of similar expressions with N substituted for m and W,:y for 


yl . r. . . . . . 
W... H, is a sum of m” and N* terms respectively which is symmetric in »v. 
(k) 


‘= a (k ©; v = 1,2, --- m) and are thus major- 
Pr 


ated by the same constant C. 
Further, we can always find a positive quantity & such that for all ¢ in J 


The terms are functions o 


| Ze'%™ | < efolzml < (¢to2m + e 02m) 


Hence 


(7) | Im(t) | S (gr -+* em) cnt | , (e'07™ 4 602m) dF, +++ dFm. 


Wm 


The rest of the proof is divided into two parts corresponding to the conditions 
of Theorem 1 and those of Theorem 2. 
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When the conditions of Theorem 2 are fulfilled we make the transformation 
(6) in (7) with t = 4 andt = —t. Then 


| Im(t) | S Cm*[P(n = m | G(to)) + P(n = m| G(—b))] S 2Cm"® < «. 


This justifies the differentiation of the series in (3). 
Substituting N for m and n > N for n = m in the above expression we further 
have 


| Jx(t) | S CN*"[P(n > N | G(t)) + P(r > N | G(—&))I, 


and conclude from Lemma 2 with k = yp in (1) that Jy(¢) tends to zero uniformly 
in t. It follows that the rth derivative of Ry also tends to zero uniformly in ¢. 

In the second part of the proof we assume the conditions of Theorem 1 to be 
satisfied. We then write (7) in the following form 


(8) | Im(t) | S Ci +++ em) 'm*P(n = m)Enam(e™ + € 7"), 


where E,~m signifies the conditional expectation when it is known that n = m. 
From the definition of n it follows that, when n = m, we have bn < Zi < 
dn. and Zm = am» or S bm». Hence 


7 toZ toZm | 7 to(Zm—1+2m) 
Enam(€ sa < Enom(e =e be = @m) = Enonle ere | Zn—1 + emi = Am 


toQm-1 toz | 
<< e si Ele _— | mm > An aT Dm—1| < CO. 


The second exponential can be treated in a similar way. Thus J,,(t) is majorated 
by a finite expression. 

Finally, we substitute N for m and n > N for n = m in (8). J being a closed 
interval it follows from condition 3° in Theorem 1 that we can find a constant 
C such that 


| Jn(t) | S CN*P(n > N)Ensw(e'?* + € °7¥), 


From the definition of n and condition 2° in Theorem 1 we have b < Zy < a, 
An applieation of Lemma 2 then shows that Jx(¢t) tends to zero uniformly in ¢. 
This proves the theorem. 

CorROLLARY TO THEOREM 3. When the conditions stated in Corollary of Theorem 2 
are fulfilled for all t in the closed interval I, Theorem 3 1s true. 

This is obvious. 


6. The fundamental identity for identically distributed variables. In the 
special case of identically distributed variables for which P(z = 0) < 1 and 
0 < y(t) < « we infer from Theorem 1 that the fundamental identity 


(9) Efe'**(e(t)) "| = 1 
holds if ¢ is complex and | g(t) | = 1. This is the case discussed in [1]. 


B 
Further, when P(z = 0) < 1, the integrals | e“dF and | e“dF cannot both 


a 
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be zero for every a > 0 and 8 < 0, and thus we infer from Theorem 2 that the 
fundamental identity holds for all real ¢ (if the limits ay and by are chosen in 
accordance with the conditions of this theorem). This proposition is somewhat 
more general than that proved in [3] by a similar method. 

It also follows from the last remark and Theorem 3 that, when P(z = 0) < 1, 
(9) can be differentiated any number of times for any real ¢t. This proposition 
contains the results in [2] and [3] as special cases. 

7. A generalization. We finally remark that the assumption made in Theorem 
3 that the expressions containing derivatives of ¢,(¢) are uniformly bounded is 
unnecessarily restrictive. For example, it seems possible to prove that the first 
derivative of (2) may be obtained by differentiation under the expectation 
sign if the series (cf. Corollary 1 to Theorem 7.4. in [6]) 


= P(n - m) 7 g(t) 


m=1 v=] g(t) 





is uniformly convergent with respect to ¢. 
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SPREAD OF MINIMA OF LARGE SAMPLES 


By Brockway McMILLAN 
Bell Telephone Laboratories, Murray Hill, N. J. 


¢ 


1. Theorems. Let x have the continuous cumulative distribution function 
F(x). Let (a1, ---,2w) be a sample of N independent values of x and y = 
inf (v7, , --- ,2w). Then y is a random variable with the cumulative distribution 
function 


(1) Gry) = 1 — (1 — Fly))”. 


Let K values of the new variable y be drawn, (y1, --- , yx) and let the spread 


w = sup (yi, °°* , YK) — inf (yi, +++, Yx). 


a 
t! 


eo oO 
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Fixing K, we consider the cumulative distribution function of w, Py(w), as 
N — ~. That is, we have K large samples of x and wish to examine the spread 
among their minima. It is evident intuitively that if F(x) = 0 for some finite z, 
these minima are bounded from below and will cluster near the vanishing point 
of F(x), making w — 0 statistically as N — o. Our theorems also show that 
even when y — — © statistically, ie., when F(z) = 0 for no finite x, the spread 
w — 0 statistically if the tail of F(x) is sufficiently small (e.g. Gaussian). On 
the other hand, if F(x) = O(e'”) asx — — ©, the distribution Py(w) does not 
peak as N — o, while for larger tails (e.g. algebraic) w —~ + © statistically. 
Two simple theorems are 


1. 


‘ F(z) 
mesa ** 
then 


lim Py(s) = 0. 


N-0 
II. Lets > 0. Uf 
F(x) = 0 for some % > — ©, or if 
: F(z) _ 
e+e °* 
then 


lim Py(s) = 1. 
N-o 


Theorem I is directly applicable to distributions with algebraic tails, theorem IT 


to Gaussian tails. We prove them both as corollaries of the more general results: 
III. If 





oe te 
et ee a ** 


then 


lim sup Py(s) < (1 — 1)**. 
N-o 


—_ 


V. Les>0O. If 
F(x) = 0 for no finite x and 


: F(x) 
a F(x + s) ae 


then 


lim inf Py(s) > [e*” — e~*]* 
N-oOo 


for any a > 0. 
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Theorems III and IV together show that an exponential tail (F(z) = 0(e‘”)) 
leads to a Py(w) which, asymptotically, is bounded away from 0 for any w > 0 
and bounded away from 1 for w sufficiently small. 


2. Proofs. Explicitly, for any s > 0, 
(2) Pyx(s) =K [ [Gy(x + s) — Gy(x)]** dGy(x + 3). 


Turning now to III: given s > 0, choose x; = 2:(e) so that (i) F(xz:;) ¥ 0, and 
(ii), x < a implies 





F(2) _ 
We then rewrite (2) as 
71 4 x—1 . 20 


Treating Gy(x + s)* as the independent variable, the first integral may be 
evaluated by the mean value theorem in the form 


- ae Gn (a2) lie ” : o)* | a Gy(xw) y- 
(5) E eo | [E aev@ +9) <[1- gcse, 


with an appropriate 72 = 2(N),—- ~ <a2< 4”. 
Using the form (2) of the integrand in the second term of (4), we may bound 
the latter by 





(6) K | aGx(a +s) < KU — Gs(m1 + 9), 


1 
since 
Gy(xa +s) — Gy(x) <1. 
Now, by factoring (1), 
a Gy(x) Fa) 17+@+---+ QO" F(x) 
(“) Gite 07 014+604+--4 8 “hota 


where Q = 1 — F(z), Q, = 1 — F(x + 8s) < Q. Combining (3), (4), (5), (6), 
and (7), 


Py(s) < [1 —2+ *" + K[l — Gr(xi + 8)]. 
Since F(x; + s) > F(x) > 0, we have 
lim Gy(a1 + s) = 1. 


Hence, 
lim sup Py(s) < [1 —1+ *7 
N-0 


TH) 
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and III follows by letting « — 0. Then I follows immediately with 1 = 1, when 
we note that Px(s) > 0. 

To prove IV, choose any a > 0. By hypothesis, for sufficiently large N we 
may always find zy = ary(a) such that 








. L 
(8) F(ty) = =. 
N 
By hypothesis, and the monotonicity of F(x), zy — — «© as N — o. For any 
e > 0, therefore, we can find No = No(a, €) such that N > No implies 
‘ Fes) <t 


F(ay +s) ~1l—e 
or F(zy + 8s) > xv (l — e). Directly from (2), since s > 0, 
=N 
P x(s) = K | IGy(x + 8) = Gy(x)|** dG n(x + s) 
IN—8 


=IN 
> K | [Gy(x - Ss) = Gy(tw)|*~ dGy(x a 8). 
znN—8 
But this last integral is of the form 


[xw ~ G) aU = (U - @*, 


whence 
Px(s) > [Gx(aw + 8) — Gr(ey)}*, 
or 
(10) Px(s) > [1 — F(aw))* — (1 — Flaw + 8))*}*. 


By (8) and (9), therefore 


. _ ah _ a(l — «) N“]K 
reo 2[(r- 8) - (1-249) 


Since this holds for all N > No(a, «), 


lim inf Py(s) > [e*” — e&*""°]* 


N-o 
This last, in turn, now holds for any e > 0, hence 

lim inf Py(s) > [e"** — &*|*. 

N-oOo 

This now holds for any a > 0. Maximizing on a yields a sharper bound than the 
result of IV. The applicable part of II follows, when L = 0, by letting a— o. 
That the conclusion of II holds when F(x») = O for some finite 2» follows from 
(10) with ay replaced by some x such that F(x) = 0, F(@1 + s) > 0. 
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ON THE CONVERGENCE OF THE CLASSICAL ITERATIVE METHOD 
OF SOLVING LINEAR SIMULTANEOUS EQUATIONS! 


By Epear REIcH 
Massachusetts Institute of Technology 


The classical iterative method, or Seidel method, is a scheme for solving the 
system of linear algebraic equations 


n 
Dd Az; = b, = 1,2, +--+, n), 
j=l 
by successive approximation, as follows: 
(vy) . . . ° 
If 2” = (x{”, 23”, ---, 2%) is the vth approximation of the solution, the 
: © (vy+1 +1 +1) +1) : . 
(v + 1)st approximation, x” Ys (eft? aft?) --- 5 et), is obtained from 


the relations 
(Anzy’*? + Arey” + Argrs? + ++» + Ame?? = bi, 
Anzxy’*? + Aoots’*? + Aasty”? + +++ + Aanth? = be, 
{Anal 4 Apa’ ‘ Awe’? bik ee oo ‘. 


(v+1) (v+1) (vy+1) (v+1) 
| Any” + Ante + AnsX3 + wee + Anata = b,, : 


x{’*» being obtained from the first equation, then x:’*” from the second, and 
so on. 

The given system can be written in matrix notation as 4x = b where A is 
a non-singular square matrix of order n, and x and b are column vectors of order n. 
Let us define square matrices A; and A: as follows: 


(Ai; ifi > j 
(A)dj={ 
Oifi <j 
(Ai; ifi <j 
(Ady =i 
Oife >) 


(Note that A; + A, = A.). 
With this notation the Seidel method can be written as the matric difference 


equation 
Aa”™ + Ax” = b. 
Now various writers, among them C. E. Berry in this journal, (See list of refer- 


1 Work done under Office of Naval Research Contract N5ori60. 


' 


| 
' 
| 
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ences at end of this paper.) have shown that a necessary and sufficient condition 
for convergence, i.e., a necessary and sufficient condition for 


lim (2; — 2) = 0, (i = 1,2,---,n), 


yn 


is that 
(1) A; has an inverse; that is A;; ¥ 0 for any 7. 
(2) The characteristic roots of (Aj'A2) all have an absolute value smaller 
than unity. 
It would be advantageous to rephrase the above condition, if possible, in terms 
of simpler requirements on A. As a step in this direction the following theorem 
is offered: 

THEOREM. If A is a real, symmetric nth-order matrix with all terms on its main 
diagonal positive, then a necessary and sufficient condition for all the n characteristic 
roots of (Ay As) to be smaller than unity in magnitude is that A is positive 
definite. 

Proor. Let z; be a characteristic vector of (Ai’ Ae) corresponding to the 
characteristic root yw; . Then 


(1) (Ay'As) 25 = 325. 


Premultiplying by 2:41, where the apostrophe and bar denote transposition 
and conjugation respectively: 


(2) 2;Aoz; = wii Aiz;. 

Consider the bilinear form 2;A2z;. 

We have 

(3) BjAz; = 2A; + HAs = (1 + wy) 2Arz;. 
Interchanging 7 and j: 

(4) z;Az; = (1 + w)2;Aie 

Taking the conjugate: 

(5) 2jA2; = 242; = (1 + aizjAzs = (1 + f)2;A12;. 
Let D be the diagonal matrix with elements 

(6) Di; = Aisi; . 


This makes 4; = D + Ap. 
Substituting this in (5): 


(7) Az; = (1+ a.) (D2; + 2:An2,;) = (1+ a)2eD2j + (1+ Bin ziAr;. 
Eliminating 2,412; between relations (3) and (7) we obtain 


(8) (1 — figus)2:Azj = (1 + a) (1 + 4,)3iDz;. 
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To obtain the necessary condition we use the fact that we must have | yu; | < 1, 
and can therefore rewrite (8) as 


(9) a oS wi)(1 + ms) +») 
i= don 


2 
u 


a. 
~ 


Dz; = 2, ( + a)ai(l + wimg es Dz;. 


m 


Ife = > cz: is any linear combination of the m < n independent characteristic 


i=l a 


vectors of (Ay Ao) then 


m 
- ae 
Pie me (& 3! 


t=1 = t,j7=1 


(10) ' z 
> Ge; + aa + uj)m; 2: De;, 


i,j7=1 k=0 


x 


#Ax = >. HDy: 


i=) 


where 


, ‘ i: 
Zz cAl + Mi) Mi 2; 


1=1 
Since by hypothesis 1;; > 0, D is evidently positive definite, and therefore 
(11) Ax > 0. 


In case the characteristic roots u;, (¢ = 1, 2,--- ), are all distinct there will be 
independent z; assured, and in that case (11) implies that 1 is positive definite. 
Consider, on the other hand, the case where the y; are not all distinct. Note 
that (a) the definiteness properties of a matrix are not changed by sufficiently 
small alterations in the elements; (b) the u’s depend continuously on the elements 
of A; (c) the discriminant of (1) is a polynomial in the A;; that does not vanish 
identically.” It follows that A must be positive definite even in the case of re- 
peated roots because an arbitrarily small change in A will separate any multiple 
p’s, still keeping them smaller than unity in magnitude, and not changing the 
definiteness properties of A. * 

This completes the proof that the condition given in the statement of the 
theorem is necessary. Now to prove sufficiency: 

Setting 7 = 7 in relation (8) we obtain 


(12) (1 — | ws | ?)z;dz; = | 1 + w; | *2:Dz; 
Since both A and D are positive definite 
(13) z;Az; > 0 and 2;Dz; > 0. 


2 The fact that the discriminant is not identically zero follows from easily constructible 
counter-examples. 
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Moreover, we cannot have w; = —1 because that would mean by (3) that 
0 = 2 Aiz: + 3, Aoz: = Bi Rie 


Relation (12) thus implies 







(14) 1—|u:|?>0 











i.e. | ui | < 1 as was to be proved. 
The part of the theorem giving the sufficient condition was already obtained 
by L. Seidel [1] and G. Temple in a somewhat more indirect fashion. 
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SOME RECURRENCE FORMULAE IN THE INCOMPLETE BETA 
FUNCTION RATIO 


By T. A. BANncrorr 


Alabama Polytechnic Institute 





1. Introduction. It is well known that the incomplete beta function ratio, 
defined by 


(1) Ip, q) = Be 






where 


(2) BAp,q) = | a’ "(1 — x)*” dz, 
0 


B(p, q) = By(p, q); 
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is of importance in probability distribution theory, and, hence, also in obtaining 
exact probability values in making tests of statistical hypotheses. In constructing 
certain extensions [1] of Karl Pearson’s “Tables of the Incomplete Beta-Func- 
tion” [2], the recurrence formulae contained in the following sections were de- 
rived. 


2. Derivation of formulae. The incomplete beta function, B.(p, q) may be 
considered as a special case of the hypergeometric series, F(a, b, c, x), thus 


.P 
(4) B.Ap, q) = : F(p,1 — q, p + 1, 2). 


The series converges for |x| < 1, if and only if a+ b < c. By setting a = p, 
b = 1 —q,andc = p + 1, as in (4), all conditions are satisfied, if we also take 
q> 0. 

Recurrence formulae for F(a, b, c, x), e. g., in the work of Magnus and Ober- 
hettinger [3], may now be directly converted for use with B.(p, q) or I.(p, q). 
In particular, using the three identities on page 9 of [3], with x replacing z, we 
have 


(5) cF(a, b,c, x) + (b —c)F(a+ 1,b,¢+4+ 1,2) 
— bi -—azFat+1,b+1,c+ 1,2) = 
(6) c(c — ax — b)F(a, b,c, x) — c(e — b)F (a, b — 1, ¢, x) 
+ abr(1 — x)F(at+1,b+1,¢+ 1,27) = 
(7) cF(a,b,c,x2) — cF(a,b+ 1,c,z) + arF(a+1,b6+1,¢+ 1,27) = 
with a = p,b = 1 — q, andc = p + 1, we obtain in turn 
(8) zI.(p,q) — Ip + 1,9) + (l — 2)Apt+1,q—-—1) =0 
(9) (p+q— px)l-(p,q) — gIe(p,9g + 1) — pl — 2)e(p + 1,9 - YD 
(10) glz(p,q + 1) + plp + 1,9) — (p + Q)le(p, q) = 0. 


Formula (8) is the basic recurrence formula used in the construction of Karl 
Pearson’s [2] tables. Formula (10) was obtained, incidentally, by the author [4] 
in a different connection and manner. 

Formulae (8), (9), and (10) may now be combined to give other useful formulae, 


oe 
e. g., 


(1) dApti,gt+l + pt qe —-QApt+1,q — (p+ QelAp,q) = 9, 


(12) plp+1,q+1 + @-— pt @)l(p,q t+ 1) 
—(p+q)Q1 — 2)Iz (p,q) = 0, 












RECURRENCE FORMULAE 
(13) (p+q— I1)zrlAp — 1, gq) 

—(p+q-— 1x + pip, g) + plip + 1,q) = 9, 
(14) (pt+qQ — vl.(p + 1,¢q - 1) 


— ip + gl — x) + g}l.(p + 1,9) + plip + 1,q+1) = 0. 
Notice that the sum of the coefficients is always zero. 


3v a repeated use of (10) it is possible to obtain the formulae 


I(p + n,q) = sara BH 


(15) 
Qireeer- nero (p,q +1), 
L(p,q+n) = GFE Ayn BH 
(16) 


("Yo tatn— tr = N° LO tno, 
where (p + q +n — 1)”, ete., refer to the factorial notation, e. g., 


ptat@m—-DI" =p+qtn—-Dptqtn—2---(ptaqtn. 


3. An application. Formulae (15) and (16) may be used to write general 
formulae for obtaining values of J,(p, g) where p or q may be greater than 50, 
e., for such values outside the range of Karl Pearson’s tables. In particular, 


T(50 + n, q) =e E +qt 49) 7.(50, q) 


4) aint q+49)"P1.(50,q + 1) ++ (=D"(q +n = DT(50, + "| 
and 


1 


I,(p, 30 = r= (49 + ny 


E + p+ 49)'"'T.(p, 50) 
(18) 


-(7) ne +p+49)" Tp + 1,50) ---(-1)"(ptn—1D) lAp+ n,50) | 


It should be noted for (17) that as » inereases the range of values that can be 
obtained outside Karl Pearson’s tables are reduced since the last term of (17) 
contains J,(50, g + n). A similar observation is noted for (18). From a practical 
standpoint the computational labor restricts n to fairly small values. Using (17) 
we may easily compute for example, 


I 0(52, 48) = I .60(50 + 2, 48) 
b 


py 199) (98) F.00 (50, 48) — 2(99)(48)7.60(50, 49) + (49)(48)T ¢:(50, 50)]. 
d1)( 
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Substituting the necessary values from Karl Pearson’s tables we calculate 

I 60(52, 48) = .9465248. 
Similarly using (18) we may calculate 

I 40(48, 52) = .0534752. 
As a check on the computations, we use the well-known identity 

Ip, q) = 1 — h_2(p’, q’), 
where p’ = g and q’ = p. Then 
I .49(48, 52) = 1 — I ,60(52, 48) 

1 — .9465248 


= 0534752. 
In like manner formulae (15) and (16) may be used to write general formulae 
for obtaining half values for p or q greater than 10.5, 1. e., for values not in- 
cluded in Ixarl Pearson’s tables. In particular, 


1,110.5 + n,q) = a | (os +@4+n)'"1,(10.5, q) — (‘) 
(9.5 + n)™ I 
(19) 
49.5 + g +n)" T(10.5,¢ + 1) +++ (—1)"(q+ n— 1)°S,(10.5,9¢ + ”)|, 
and 
tin (08 40 © eee c + p+n)I.(p, 10.5) — (7) 
(9.5 + n)™ 1 
(20) 


Using (19) we may compute 


I (12.5, 8) = - (19.5) T.60(10.5, 8) — 2(8) (19.5) 7 .¢0(10.5, 9) 


1 
(11.5)° 


‘ + (9) (8)J.60(10.5,10)], = .4512367. 


Similarly using (20) we obtain 
I 40(8, 12.5) = .5487633. 
Employing the check formula, 
I 40(8, 12.5) = 1 — I.e(12.5, 8) 
= 1 — .4512367 
= .5487633. 


-p(9.5 + p +n)" Lp +1, 10.5) «++ (—1)"(p + n— 1)" LAp +n, 103) | 
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Thanks are due to Dr. J. C. P. Miller, Technical Director, Scientific Com- 
puting Service, Limited, London, England, for helpful suggestions in the prepara- 
tion of this paper. 
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ON A THEOREM BY WALD AND WOLFOWITZ 
By Gotrrrreo Kk. NoOrTHER 


New York University 


Let S, = (li, +++ h,), (n = 1, 2, +--+), be sequences of real numbers and for 
all x denote by H..,...c,, the symmetrical function generated by hit --- ha”, 
Le., He,..e, = Dh --- ho where the summation is extended over the n(n — 1) 


-++ (n — m + 1) possible arrangements of the m integers 4, --- , %m , Such that 
1<i;<nandi; = t:, (j,k = 1, --- , m). According to Wald and Wolfowitz 
[1] the sequences §5, are said to satisfy condition W, if for all integral r > 2 


TF iw 


nN i=) ‘a 
~- _ ne TiS 
] + — 9 fin 
7 7 (h; — h) 
n == 
where h = 1/nD ,2,h;. 
Given sequences %, = (a1, +++ ,@,) and T, = (d;,---,d,), consider the 


chance variable 
Li — day a eave sar AnXn ’ 


where the domain of (21, +--+ ,2n) consists of the n! equally likely permutations 
of the elements of %,. Then it is shown in [1] that if the sequences 2%, and D,, 


i ai > . 7 ‘ 0 ‘ : 
satisfy condition W, the distribution of L, = (L, — EL,)/o(L,n) approaches the 
normal distribution with mean 0 and variance 1 as n — «. These conditions 





1 The symbol O, as well as the symbols o and ~ to be used later, have their usual meaning. 
See e. g. Cramér [2, p. 122] 


a 
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for asymptotic normality can be weakened. It will be shown that the following 
theorem holds: 


THEOREM. L,, is asymptotically normal with mean 0 and variance 1 provided the 
sequences D, satisfy condition W while for the sequences Un 


> (a; — a)’ 


(1) geen, a OE). (r = 3,4, ---). 


b (a; — at] 


We note that L’, is not changed if a; is replaced by [1/n 2; (a; — a)°*(a; — @) 
and d; by [1/n = it; (@; — ad)" "(d; — d). Therefore it is sufficient to prove 
asymptotic normality provided 
(2) D2. = n, D, = O(n), 


(3) Ae = n, A, = o(n"”), 
Then 
EL, = D,Ex, = 0, 
var L, = EL; = D.Ex} + DuExnz: 


] 1 ° ae 
= - A.D. + ——— (A; i A,)(D; — Dr) ~ n, 

n n(n —1) 
and it is sufficient to show that n-”EL’, tends to the rth moment of a normal 
distribution with mean O and variance 1. 


Now we can write 


n n 
—?7/2 r r/2 7 
w=n EL, =n > ee > Ed,, 2, °° di, Xi, 
iy=l 


7,;=1 


(4) = n""(D,Exi + +++ +e(r, 1, +++ 5 €m)Dey--em Ei! +++ 
+ -++ + Dy... Ex +++ 2] 


where ¢) + +--+ + ¢m = r with e,, (k = 1, --+,m), positive integral and the 
coefficient c(r, ¢: , -: + , €») stands for the number of ways in which the r indices 
11, °**,2, can be tied in m groups of size &,--+ ,@m , respectively, so as to 
produce the terms of Dz,...<,E ai! +++ xn". 
Since Ex}! +--+ 25" ~ n "Ae..-em We have 
e\ —r/2 —(r/2+m) 
@) a "Diy. +++ "Digan, @ Bi, a, «++ tu); - 
Lemma. B(r, 1, --+ ,@m) ~ O unless 
(6) m = 7/2, Gy = +e+ = Oye = 2. 


Tn that case B(r, 2,--- ,2) ~ 1. 
Before proving this lemma we shall show that our theorem follows immedi- 
ately. By (4) u, is the sum of a finite number of expressions B(r, ¢:, +++ , @m). 
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Therefore if r = 2s + 1, (s = 1, 2, ---), uesi1 ~ O, since at least one of the ex , 
(ck = 1,---,m), in all the B(2s + 1, e1, --- , em) adding up to u2,.; must be odd. 
If r = 2s, wos ~ c(2s, 2, --- , 2). Since the first index in (4) can be tied with any 
one of the other 2s — 1 indices, the next free index with any one of the remaining 
2s — 3 indices, etc., it is seen that u2, ~ (2s — 1)(2s — 3) --- 3. However these 
are the moments of a normal distribution with mean 0 and variance 1. This 
proves the theorem. 


Proor OF Lemma. Define A(ji,--- ,jn) = Aj, +++ Aj, . Then A.,...., is the 
sum of a finite number of expressions 4(j:, --- ,j,), Where thej,,(¢g = 1,--+,h), 
are obtained from e; , +++ , @m by addition in such a way that 
(7) A+++ +Rw a+ ++ +e, ew F. 

Since by (3) A; = 0, we need only consider those A(j;:,--- ,7,) for whieh 


jo > 2, (9g = 1,--+- ,h). If some j, > 2 by (3) and (7) 


A(j oe » Jn) = o(n’ 2) 





If j, = 
(9) A(2,++-,2) = As? = n™. 








This last ease can only happen if r is even and e; , (k = 1, --- , m), equals either 
1 or 2. Therefore, unless (6) is true 


(10 m > r/2. 












Similarly, writing D.,...., as a sum of products of the kind D;, 
seen that by (2) 


D;,, it is 


(O(n™) ifm < r/2 
(11) Dasa = . 


O(n") ifm > r/2. 


Thus by (8)-(11) 












r/2 


As..2™ Ay = on", 


De..2 ~~ Di = 










(12)-(14) together with (5) prove the lemma. 


Let a, ,@2,--+-+ be independent observations on the same chance variable Y. 
We may ask what conditions have to be imposed on the distribution of Y to 
insure—at least with probability 1—that condition (1) is satistied. Wald and 
Wolfowitz state in Corollary 2 of [1] that provided ¥ has positive variance and 
finite moments of all orders the a, , a2, --- satisfy condition W with probability 
1 and therefore insure asymptotic normality of LZ, provided the sequences 2D, 
satisfy condition W. On the other hand, it can be shown that the a.,@, -- 
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satisfy condition (1) with probability 1, provided Y has positive variance and 
a finite absolute moment of order 3. Thus condition (1) constitutes a considerable 
improvement over condition W. 
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ON SUMS OF SYMMETRICALLY TRUNCATED NORMAL RANDOM 
VARIABLES 


r y ’ XY 1 
By Z. W. Birnpaum anp F,. C, ANDREWS 


University of Washington, Seattle 


1. Introduction. Let X,. be the random variable with the probability density 


i\Ce* “ for | wv | < a 
(1.1) fal) — l ! 
\0 for |x| >a, 
: . “1° ° I r2/2 ° 
obtained from the normal probability density a -@ by symmetrical trunca- 
/ au 
: ° : vim) 4... . ° 
tion at the ‘“‘terminus”’ | .x | = a, and let 8,” be the sum of m independent sample- 


values of XY, . We consider the following problem: An integer m > 2 and the real 
numbers A > 0, « > O are given; how does one have to choose the terminus a 
so that the probability of | 8,”” _ > A is equal to e, 
( y¥(m) « 
(1.2) P(| 8S.” | > A) = @? 

This problem arises for example when single components of a product are 
manufactured under statistical quality control, so that each component has the 


° 
r2 


length Z = / + N where X has the probability density e* *, and the final 


Dan 
ai 


product consists oi # components so that its total length S is the sum of the 


lengths of the components. We'wish to have probability 1 — e¢ that S differs 
from m/: by not more than a given A. To achieve this we decide to reject each 
single component for which Z — hk. = |X | > a; how do we determine a? 


The exact solution of this problem would require laborious computations.” 
In the present paper methods are given for obtaining approximate values of a 
which are “safe’’, that is such that 


« v(m) 
(1.3) PQ Sa | 2A) Se 

1 Research done under the sponsorship of the Office of Naval Research. 

2 A similar problem has been studied by V. J. Francis [2] for one-sided truncation; he 
actually had the exact probabilities for the solution of his problem computed and tabulated 


for m = 2, 4. 
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In deriving these safe values, use will be made of theorems on random variables 
with comparable peakedness, for which the reader is referred to a previous 
paper [1]. 


2. The safe value a, . For fixed a > 0, we consider the normal random variable 
Y, with expectation 0 and with probability density g,(Y.) such that g.(0) = f.(0). 
It is easily seen that Y, has the standard deviation 


(2.1) oo = vlan [ e PF at 
or ” V/ 24 — a - 


and that ga(é) < fa(é) for | &| < a, ga(~) > O = fa(é) for | | > a. Hence, applying 
Theorem 1 in [1], we conclude that 


2 oe 
4/2 I A aie 


If m, A, and e¢ are given, we determine &, from — of the normal probability 
2 


—72/9 
re 


(2.2) P(| Sk" | > A) < 


integral so that \/ an [ ec"? dt = €, set og = E, - in (2.1), and solve the 
equation 

A 1 "= 
EeVm V2n be 


for a using again tables of the normal probability integral. In view of (2.2) this 
solution satisfies (1.3) and hence is safe; it will be denoted by a, . 





(2.3) ror 2 


3. The safe value a2. A direct application of Theorem 2 in [1] yields the 
inequality 


P(| SS” | > A) 


aii pe (—1)? ("\(4 +m — 2/) = hy, 2 
2"—lm! s(m+ala)<jism J/\a a 


for0 < A < ma. Hence by equating h»,(A/a) to « and solving for a, we obtain a 
safe value which will be denoted by az. It is of interest to note that (3.1) zs true 
not only for fa(x) defined by (1.1) ie. truncated normal, but for any probability 
density fa(x) which is symmetrical and unimodal, since these are the only assump- 
tions needed for Theorem 2 in [1]. 


(3.1) 


lA 


4. Solution for large . The random variable X,4 has the variance 


26’’(a) 


(4.1) o(X.) =1+ ai 


where 
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Hence, according to the central limit theorem, we have the approximate equality 
) 00 
(4.2) P(|S.” | >A) = > / ee 
V 2m J(4lo(Xq)+/m) 
for m sufficiently large. 

It can be reasonably expected that the cumulative distribution of S{”” differs 
from its limiting normal probability integral by less than the cumulative distri- 
bution of the sum U;”’ of m independent uniform variables in (—a, +a) differs 
from its limiting normal probability integral. Already for m = +4 the cumulative 
distribution of U;”’ differs from the corresponding normal cumulative by less 
than .0075. Equally good or better approximation may, therefore, be expected 
for the distribution of S;”, so that the error in the approximate equality (4.2) 
between the two-tail probabilities should be less than .015 for m = 4, and still 
less for m > 4. 

Equating the right-hand term of (4.2) to ¢ and solving for o°(Xq), we obtain 


we 2¢’’ (a) 1 (AY 
oO (XY, — ee a om ome » 
"Ve > 2¢(a) — 1 m \é&, 


an equation which can be solved for a with the aid of tables of d(x) and ¢’’(z). 
We denote this value of a by a. 


} 


5. Use of the different solutions in practice. From the foregoing it appears 
that the following procedure may be followed in solving our problem in any 
definite case: 

If m is large, a is very close to the exact solution of (1.3) and may be used 
safely. 

If m is not large but m > 5, it is conjectured that a: is such that the left-hand 
term in (1.3), for a = a, differs from e by less than 0.015. 

If m < 4, the larger of a and az should be used. Table I contains the A for 
which a; and a2 have the same value, say a’; a; or a2 should be used if the given A 
is greater or smaller, respectively, than the tabulated value. The value a, is 
easily computed from a table of the normal probability integral by the procedure 
of section 2. The value a2 can be obtained by reading off A/a, from Table II. 








TABLE I | TABLE II 

Values of A for which ay = az = a’ for given m, «€ Values of A/az for given m, « 
’ 2 3 4 \m : 

a 4 a 4 - | A a’ € = = 3 4 

esis prciatentcinnene scimemmmemenenstemmmmascenennmn  sanmeeill 

001 | 4.568 2.357 5.446 2.008 | 6.152 1.842 .001 | 1.937 2.712 3.339 
002 | 4.258 2.228 | 5.059 1.918 a.416 1.499 .002 1.911 2.637 3.213 
005 3.808 2.047 | 4.512 1.799 5.111 1.697 .005 1.859 2.507 3.011 
01 3.4388 1.910 4.074 1.712 4.6382 1.640 01 1.800 2.379 2.824 
02 3.034 1.765 3.614 1.630 4.131 1.589 02 1.718 2.217 2.6600 
05 2.456 1.581 2.970 1.583 3.425 1.529 05 1.553 1.937 2.240 
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39 
13 
11 
24 
100 
40 
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6. Examples. 1) 1 = 3.8, m = 4, « = .05. Since A is greater than the value 
3.425 in Table I, we compute a; = 2.162. From Table II we would obtain 
A/a: = 2.240 and thus a2 = 1.696 < a,.2) A =3,m=4,e= .02.Since A < 4.131, 
we read A/a, = 2.600 from Table II and obtain a, = 1.153 which will be greater 


than a,.3) A = 5, m = 30, « = .05. Using the method of section 4 we obtain 
a= 1.62. 
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A CERTAIN CUMULATIVE PROBABILITY FUNCTION 
By Sister Mary AGNeEs Harke, O.S.F. 
St. Francis College, Ft. Wayne, Indiana 


Graduations of empirically observed distributions show that the cumulative 
probability function F(x) = 1 — (1 + 2") ""“ is a practical tool for fitting a 
smooth curve to observed data. The graduations are comparable with those 
obtained by the Pearson system, Charlier, and others and are accomplished 
with simple calculations. Given distributions are graduated by the method of 
moments. Theoretical frequencies are obtained by evaluation of consecutive 
values of F(x) by use of calculating machines and logarithms, and by differencing 
NF (x). No integration nor heavy interpolation is involved, such as may be 
required in graduation by a classical frequency function. Burr [1] constructed 
tables of 1, 0, a3, and ay values for the function F(x) for certain combinations 
of integral values of 1/c and 1//. In these tables curvilinear interpolation must 
be used in finding an F(x) with desired moments. The writer constructed more 
extensive tables for the same cumulative function with c and k a variety of 
real positive numbers less than or equal to one, such that linear interpolation 
can be used to determine the parameters c and k for an F(x) that has a; and 
a; approximately the same as those of the distribution to be graduated. These 
tables have been deposited with Brown University. Microfilm or photostat copies 
may be obtained upon request to the Brown University Library. 

The writer used the definitions of cumulative moments and the formulas 
for the ordinary moments , ¢, a3, and ay in terms of cumulative moments 
as developed by Burr. These latter moments were tabulated for the function F(x) 
having various combinations of parameters c and k, c ranging from 0.050 to 0.675 
and k from 0.050 to 1.000, each at intervals of 0.025. Within these ranges only 
those combinations of c and k were used which yielded a; of approximately 1 or 
less and a, values of 6 or less, since such moments are most common in practice. 

It can be verified that over most of the area of the table a; values obtained 
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by linear and by curvilinear interpolation on i (or on c) differ by less than 0.001 
and values of a; by approximately 0.01 or less. If a3 = constant and a, = constant 
curves are plotted on c, k axes, it will be seen that there exists only one solution 
(c, k) of the equations a; = Bic, k) and ay = C(e, i). Furthermore, some a4 
curves intersect two a; curves representing the same | a; |. Thus the chance of 
finding an appropriate function F(x) for graduation is increased since by reversal 
of scale an F(x) with a positive a; may be used to graduate a distribution with a 
negative a; , and conversely. 

Graduation of an observed frequency distribution is easily accomplished. 
Linear interpolation on k for a fixed c seems to be the best method for determining 





Fic. 1. The a;, 5 chart for the Pearson system of frequency curves and the area covered 
by f(z) = 1 — (1 + 2*)~'* (subscript L = bell-shaped) 


the parameters of an F(x) that has a; exactly the same and a; nearly the same as 
the observed a; and a,. If the observed a3 and ay, are fairly close to an entry 
in the table, no interpolation is required. Direct linear interpolation is used to 
determine »; and o for the c and / just found. Letting VW and S be the mean and 
standard deviation of the given distribution, the formula, 


oT Wy a2 X =e M 
- °°. S 


is used to translate the class limits X of the given distribution to the correspond- 
ing x’s of F(x). For any x that is negative the quantity 1 + 2'’° is taken as one 
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to make F(—x) = 0 in accordance with the definition of F(x) [1]. The values of 
(1 + x'*)*" for the various x’s are computed by logarithms and differenced to 
obtain the probabilities for the given class intervals, according to equation 


Pa<z<b)= [ 1@ dx =F(b) — F(a). 


The respective theoretical frequencies are these probabilities multiplied by N, 
the number of cases. 

The headings that proved satisfactory for the columns of the graduation 
work-sheet are: class intervals (in observed physical units), X (u if unit class- 
interval is used), fos, x, 1 + a”, N/(1 + 2°)", and fun. 

The relation of F(x) to the Pearson system of ingens curves is presented in 
Figure 1, which is a reproduction of a major part of Craig’s chart for a3 and 

6 [2]. In this chart the parameters of the twelve Pearson curves are expressed 1 in 
terms of a3 and 6, where 6 = (2 a — Bas — 6)/(a, + 3). Values of a3 and 6 
were computed for F(x) = 1 — (1 + 2’) "* in which ¢ and k were assigned 
the values listed in the a3, as a: The dotted area superimposed on the Craig 
chart is that covered by these a3 , 6 values for F(x). Although it is small in size 
compared to the total area, it contains a part of the areas representing the three 
main Pearson curves, I, IV, and VI, as well as the point for the normal curve 
and part, of the line on which lie the points corresponding to the bell-shaped 
curves of the Type III functions. It also includes transitional Types V and VII. 
Thus the function F(x) covers part of an important area on the a3, 6 chart for 
the Pearson curves. 

The function F(a) was used to graduate satisfactorily several observed dis- 
tributions classified as Pearson types, including the three main Types, I, IV, and 
VI, and transitional Types III and VII. 

One advantage in the use of this cumulative function F(x) is that it takes but 
one symbolic form with the area covered, whereas the Pearson-system curves 
require several different expressions of various complexity requiring identification 
of type. Furthermore, graduation by a Pearson function generally involves 
approximate integration or heavy interpolation in the incomplete beta function 
tables for the evaluation of the integrals of the Pearson functions, whereas 
graduation by a function F(x) is easily and quickly performed since /’(2) only 
involves two number-parameters readily determined by means of the a3, ay 
table and straight arithmetic. 


The writer is deeply indebted to Professor Irving W. Burr of Purdue Uni- 
versity for valuable suggestions in this study. 
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ABSTRACTS OF PAPERS 


(Presented at the Berkeley Meeting of the Institute, June 16-18, 1949) 


1. Extension of a Theorem of Blackwell. E. W. Baranxin, University of Cali- 
fornia, Berkeley. 


It is proved that Blackwell’s method of uniformly improving the variance of an un- 
biased estimate by taking the conditional expectation with respect to a sufficient statistic, 
is, in fact, similarly effective on every absolute central moment of order s = 1. The method 
leads to finer detail concerning the relationship between an estimate and its thus derived 
one. (This paper was prepared with the partial support of the Office of Naval Research.) 


2. On the Existence of Consistent Tests. AGNes Bercer, Columbia University., 
New York. 


Let M(B) denote the space of all probability-measures defined over a common Borel- 
field B. Let {m} = M, {m’} = M’ be two disjoint subsets of Yt(B) and let Hy (H1) be the 
hypothesis stating that the unknown distribution is in M (W’). In Neyman’s terminology 
Hp can be consistently tested against H, if to any preassigned e > 0 there exists an integer 
n and a critical region in the product-space of n independent observations such that the 
probabilities of the errors of the first and second kind corresponding to this region are 
simultaneously smaller than e. A sufficient condition which for a certain type of consistent 
test is also necessary is established. The condition is satisfied whenever the disjoint sets 
M and M’ are closed and compact with respect to a certain suitable topology introduced 
on I2(B). Thus for instance Hy, can be consistently tested against H; if M and M’ contain 
only a finite number of measures or if the measures in VW resp. VM’ depend continuously on 
a parameter ranging over a closed and bounded subset of some Euclidean space. 


3. Effect of Linear Truncation in a Muitinormal Population. Z. Winuram Birn- 
BAUM, University of Washington, Seattle. 


Let (X, Y1, Y2,--:, Yn-1) have a non-singular n-dimensiona! normal probability 
density f(X, Y1 , Y2, +--+ , ¥n-1) for which all parameters are given, and let ¢(X, ¥1, Y2, 


- , Yn-1) be the probability density obtained from f by truncation along a given hyper- 
plane: ¢ = Cf fora:¥; + +++ + an-1Yn-1 S aX +6, ¢ = 0 elsewhere. What is the marginal 
distribution of X for this truncated distribution? This question can be answered by using 
a set of tables with only two parameters. These tables make it also possible to solve prob- 
lems such as: determine the plane of truncation so that the marginal distribution of X has 
certain required proper ‘+s. (This pdper was prepared under the sponsorship of the Office 
of Naval Research.) 


4. Statistical Problems in the Theory of Counters. (Preliminary Report). CoLin 
R. Biyru, University of California, Berkeley. 


The assumptions made about counter action and distribution of incident particles are 
the same as those of B. V. Gnedenko [On the theory of Geiger-Miiller counters, Journ. Ex- 
per. i Teor. Phiz, Vol. 11 (1941)]. The distribution of the number X of particles registered 
during a given time (0, ¢) is found explicitly, in terms of the density a(v) of incident par- 
ticles at time v. The problem considered is that of estimating the parameters of a(v). For 


the special case a(v) = a, the distribution of X reduces to P{X = x} = a*(t — zr)? exp 
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(x — 1)r]*/i!forz =1,2,---,8s -| frtx-o) =e*';P!Y=s+1} =1—exp{—a(t — 


8 
sr)} dale — srj*/i!; P{X > s + 1} = 0. This distribution has been found in another 
i=0 
problem by J. Neyman [On the problem of estimating the number of schools of fish, submitted 
to Statistical Series, Univ. of Calif. press]. For this special case the maximum likelihood 


estimate @ of a is found to be given by @r exp (Gr) = {i + 7/(t — ar)}72r/(t — ar). If 
7/(t — zr) is small, as will usually be the case, @ will be close to the estimate x/(t — zr) 


usually used for a. 


5. Some Two-Sample Tests. Doucias G. CHapman, University of California, 
Berkeley. 
Let X, Y be random variables normally distributed with means &, 7, variances oa , o2 
respectively. The two sample procedure formulated by Stein to obtain a test with power 


independent of c, for the hypothesis 7 = £ is used here to determine a test for the hypothesis 
< 


° = r (r any pre-assigned real number). The size and power of this test are independent of 
n 


co, and oz. The two sample procedure may be extended to the more general case of testing 
the hypothesis of equality of means of several normal populations, the variances being 
unknown. Approximate tests are obtained for this case. Finally it is shown that this two 
sample procedure can be used to select that normal population, of several, with the greatest 
mean: the rule of selection having a preassigned level of accuracy. (This paper was pre- 
pared with the partial support of the Office of Naval Research.) 


6. Minimum Variance in Non-Regular Estimation. R. C. Davis, U. 8. Naval 
Ordnance Test Station, Inyokern. 


The Cramér-Rao inequality for the minimum variance of a regular estimate of an un- 
known parameter of a probability distribution is extended to a broad class of non-regular 
types of estimation. The theory is developed only for the case in which a probability den- 
sity function and a sufficient statistic for the unknown parameter exist. For every non- 
regular estimation problem included in the above class, it is proved that there exists a 
unique unbiased estimate which attains minimum variance, and a method is given for 
obtaining the sample estimate. Examples are given; such as, the rectangular distribution, 
a class of truncated distributions, etc. 


7. Auxiliary Random Variables. Mark W. Evpey, California Municipal Statis- 
tics, Ine., San Francisco. 


In testing hypotheses concerning discontinuous random variables it is not possible to 
find regions of arbitrary size, and so if we compare two critical regions, selection between 
them on the basis of the usual criteria of the Neyman-Pearson theory of testing hypotheses 
may be confused by the difference in their sizes. This difficulty may be avoided by allowing 
the statistician to use a mixed strategy in such cases, and make his decision to accept or 
reject the hypothesis depend upon an independent auxiliary random variable. For example, 
if A is a binomial variable, and U has a uniform distribution (0,1), then Z = A + U may 
be used to test hypotheses concerning the binomial parameter, and regions of any size may 
be found. For the binomial case this procedure leads to a class of uniformly most powerful 
tests for one-sided alternatives, and to uniformly most powerful unbiased tests for two- 
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sided alternatives. Similar results are obtained for other common discontinuous variables, 
and the same device may be used in considering confidence regions and decision functions 
for such variables. (This paper was prepared with the partial support of the Office of 
Naval Research.) 


8. Estimation in Truncated Samples. \[,x Haurertn, The Rand Corporation. 
Santa Monica, California. 


A death process is considered which starts with n individuals of zero age, each following 
the mortality law, f(z, 6). That is, 


t 
F(t) = Pr ‘Age at death < t} = / f(x, 6) dz, 
0 


where f(z, 6) is a probability density. We suppose we truncate the process at a fixed time, 
T’, and wish to estimate 6 when 
a) individuals who die are not replaced, and 
b) individuals who die are replaced by individuals of zero age following the mortality 
law, f(z, 0). 

In both eases, it is found that, under mild conditions, estimation by Maximum Likeli- 
hood gives optimum estimates. The estimates are best in the sense of being asymptotically 
normally distributed and of minimum variance for large samples. 

The proofs are given for the case of a single parameter, but can be extended to the multi- 
varameter case. Examples are given. 


9. Some Problems in Point Estimation. J. L. Hopces, Jr. anp E. L. LEHMANN, 
University of California, Berkeley. 


Some point estimation problems are considered in the light of Wald’s general theory. It 
is shown that when the loss function is convex, one may restrict consideration to nonran- 
domized estimates based on sufficient statistics. Minimax estimates are obtained in a 
number of cases connected with the binomial and hypergeometric distributions, and with 
some non-parametric problems. Some prediction problems are also considered. (This paper 
was prepared with the partial support of the Office of Naval Research.) 


10. Completeness in the Sequential Case. I). L. LEHMANN AND C. Srern, Uni- 
versity of California, Berkeley. 


Recently, in a series of papers, Girshick, Mosteller, Savage and Wolfowitz have con- 
sidered the uniqueness of unbiased estimates depending only on an appropriate sufficient 
statistic for sequential sampling schemes of binomial variables. A complete solution was 
obtained under the restriction to bounded estimates. This work, which has immediate 
consequences with respect to the existence of unbiased estimates with uniformly minimum 
variance, is extended here in two directions. A general necessary condition for uniqueness 
is found, and this is applied to obtain a complete solution of the uniqueness problem when 
the random variables have a Poisson or rectangular distribution. Necessary and sufficient 
conditions are also found in the binomial case without the restriction to bounded estimates. 
This permits the statement of a somewhat stronger optimum property for the estimates, 
and is applicable to the estimation of unbounded functions of the unknown probability. 


11. The Ratio of Ranges. Ricuarp F. Link, University of Oregon, Hugene. 


The distribution of the ratio of two ranges from independent samples drawn from a 
normal population is given analytically for n, and nz S 3. A table of percentage values, F, 
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ig given for a = .005, 01, .025, .05, .10 and for all combinations of n; and nz up to 10, where 


a = Pr (w/w: > R) and w; and wz are the observed ranges. (This paper was prepared under 
the sponsorship of the Office of Naval Research.) 


12. Some Problems Arising in Plant Selection and the Use of Analysis of 
Variance. StaNLeEY W. Nasu, University of California, Berkeley. 


The yields of many (m) varieties are compared in a field trial. A few varieties having the 
highest and lowest yields in this trial are selected for further testing. What chance is there 
that the first trial will give a significant result, the second trial not? Let ¢; denote the true 
mean yield of the 7th variety, and assume that the &; are themselves normally, independently 
distributed with variance of . Let P;. (k = 1,2) denote the probability of a significant result 
in the kth trial, using the F-test. For fixed o; > 0, lim», Pi: = 1. (See Nash, Annals of 
Math. Stat., Vol. 19 (1948), p. 434.) Now let ¢; > 0 take on a decreasing sequence of values 

1 E(F) 

res as 
a39(m) Or 
E(numerator of F) s .  & 1 
——— Also limn.,. P: <1 if and only if o; = 0{ ———). For o, 
Vlog m 


1 : si . 
=0| ——— },limn..P: = a, the level of significance used. Thus, corresponding to any 
Vlog m 


as m increases. If , then lima, Pi: = 1. Here 1 + o1g(m) = 


Se 
oo (= error variance) 


m, however large, one can find values of oy for which the chances are considerable (or even 
approaching 1 — a), that the two field trials will give opposite conclusions when the F- 
test is used. 


13. Asymptotic Properties of the Wald-Wolfowitz Test of Randomness. Gorr- 
FRIED E. NoETHER, Columbia University, New York. 


Let a; , --* , @n be observations on the chance variables X; , --- , X, . Wald and Wolfo- 
witz (Annals of Math. Stat., Vol. 14 (1943), pp. 378-388) have shown how the statistic R, = 


D;_, Litian , (Luz; = 7;), can be used to test the null hypothesis that the X; , (¢ = 1, --- 


,’ 


n), are independently and identically distributed by considering the distribution of R, in 
the subpopulation of all permutations of the a; . In the present paper it is shown that when 
the null hypothesis is true this distribution of R, is asymptotically normal provided 


n 


25-1 (ai — a)"/{2"_, (ag — a)?]"/2 = O[n2-"'/4], (r = 3,4, --- ), a condition which is satisfied 


with probability 1 if the a; are independent observations on the same chance variable X 
having positive variance and a finite absolute moment of order 4 + 6, (6 > 0). Conditions 
are given for the consistency of the test based on R, when under the alternative hypothesis 
observations are drawn independently from changing populations. In particular a down- 
ward trend and a regular cyclical movement are considered, both for ranks and original 
observations. For the special case of a regular cyclical movement of known length the 
asymptotic relative efficiency of the rank test with respect to the test performed on original 
observations is found. It is shown that when using ranks, R, is asymptotically normal 
under the alternative hypothesis provided lim inf,.,, var(n~*/?R,) > 0. This asymptotic 
normality of R, is used to compare the asymptotic power of the R,-test with that of the 
Mann T-test (Econometrica, Vol. 13 (1945), pp. 245-259) for the case of a downward trend. 


14. On the Similar Regions of a Class of Distributions. SreraN PreTers, Univer- 
sity of California, Berkeley. 


The class of distributions considered is essentially the class of those distributions of n 
variables which, by a suitable transformation of the variables and the parameter, can be 
transformed into distributions defined in the whole R, for which the parameter is a location 
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parameter. These regions satisfy a certain partial differential equation. The transformed 
distributions of the variables ¥1,-°-- yn and parameter 7 possess a class D, of similar 
regions with respect to 7 which can be defined as the smallest additive class of regions 
which includes all regions defined by 


gl(yi — yn)y °°? (Yn + Yn) ZC 


where g is a continuous function. The class D; does not exhaust all similar regions. There 
exists among the regions of class D; one which is most powerful for testing a given addi- 
tional parameter o. If there exists among all similar regions a most powerful region for 
testing a, then that region will be the most powerful region of class D, . 


15. Some Problems in Sequential Analysis. CHartes M. Srern, University of 
California, Berkeley. 


Wald’s fundamental identity for cumulative sums is extended to dependent random 
variables. The first derivative of this at the origin is equivalent to a result of Wolfowitz 
(Annals of Math. Stat., Vol. 18 (1947), p. 228, Th. 7.4). Higher derivatives of this at the 
origin can also be obtained from linear combinations of Wolfowitz’s result applied to suitable 
products of the original random variables. These equations yield approximate OC and ASN 
curves for probability-ratio tests for a simple hypothesis against a single alternative con- 
cerning some of the more usual stationary Markoff chains. Bounds for the amount by 
which the ASN exceeds that of the most efficient test are also obtained. The results are 
applied in particular to random variables taking on only the values 0, 1 with conditional 
probabilities depending only on a finite number of the preceding observations. The case 
of linear dependence of normal random variables with fixed conditional variance is also 
considered. 


16. Some Aspects of Links Between Prediction Problems and Problems of 
Statistical Estimation. ErL1inc Sverprwp, University of Oslo. 


A prediction is not taken as a probability statement about additional observations of 
the random variable already observed. It is presumed that the statistical interpretation 
of the sample will result in some action influencing the random variable subject to predic- 
tion. The probability distribution of this random variable is given for each of an a priori 
class of probability functions for the observed random variable and for each of a class of 
possible actions. ‘‘Utility” as a function of the random variable to be predicted and of the 
action is defined. It is shown that the problem of which action to take in order to maximize 
expected utility is identical with a problem of statistical inference with a uniquely defined 
weight function in the Wald sense. It is further shown that this procedure is adaptable to 
stochastic processes of a general type and this provides a means of connecting the theory 
of stochastic processes with the theory’of statistical inference. Some examples are given to 
illustrate the general theory. 


17. Some Large Sample Tests for the Median. Jonn E. Watsn, The Rand 
Corporation, Santa Monica, California. 


Consider a large number of independent observations from continuous populations with 
a common median. Sone non-parametric large sample tests for the population median are 
presented which are based on either two or three order statistics of the sample. If all the 
populations are symmetrical, these tests are equal-tailed with specified significance level a. 
If the observations are a sample from a normal population, these tests have high power 
efficiencies. Some tests based on three order statistics are developed which also have signifi- 
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cance level a if all the populations are not symmetrical; however, in this case the resulting 
test is one-tailed instead of equal-tailed. Using these tests for situations where the popula- 
tions are believed to be symmetrical furnishes a safety factor with respect to Type I error. 
Tests are presented for the special case where each population is either symmetrical or 
skewed in a specified direction. If the populations are not symmetrical the significance 
level distribution is .4a.to one tail and .6@ to the other, rather than .5a to each tail. Also 
some non-parametric large sample tests of whether a sample is from a symmetrical popula- 


tion are derived. These tests are based on three order statistics of the sample and have 
bounded significance levels. 


18. Continuous Sampling Plans from the Risk Point of View. Zivia S. WURTELE, 
Stanford University, California. 


The quality of a lot can be improved by a screening process whereby the defective items 
found during inspection are replaced by non-defective items. The type of sampling plan 
adopted will generally depend upon the cost of inspecting items, the number of defective 
items in the lot prior to inspection, and the loss due to defective items remaining in the lot 
after inspection. The loss if the lot is accepted after d defectives are found in a sample of 
n items is equal to c(n) + h(D) where D is the number of defectives left in the lot and c(7) 
is the cost of inspecting n items. An inspection procedure S is defined by a set of stopping 
points {(d, n)}. Let r(p, S) be the expected loss if p is the probability of a defective and 
the procedure S is used. It is assumed that the lot is obtained from a binomial population. 
For any a priori distribution F(p), a Bayes procedure is one which minimizes the expected 
risk, 


1 


[ r(p,S)dF(p). 

0 

A systematic method of obtaining Bayes solutions exists, but the computations are formid- 
able. Under fairly general conditions the Bayes solutions are shown to be multiple sampling 
plans, in which the size of the ith sample depends upon the number of defectives in the 
(i — 1)st sample. In particular, if the production is in a state of statistical control, a 
Bayes solution is a fixed sample size. It is also shown that for most reasonable loss func- 


tions, there exists no mini-max procedure which is uniformly better than the trivial one; 
namely, the Bayes procedure if p = 1. 








NEWS AND NOTICES 


Readers are invited to submit to the Secretary of the Institute news items of interest 
Personal Items 


Dr. Irving Burr has been promoted to a full professorship at Purdue University. 

Dr. D. A. 8. Fraser, who received his Ph.D. degree at Princeton University in 
June, has accepted a position as Instructor of Mathematics at the University 
of Toronto. 

Dr. H. K. Hartline, formerly at the Johnsen Research Foundation of the Uni- 
versity of Pennsylvania, has accepted an appointment as chairman of the Thomas 
Jenkins Department of Biophysics, Johns Hopkins University. 

Dr. Leo Katz has been promoted to an associate professorship in the Mathe- 
matics Department of Michigan State College, East Lansing, Michigan. 

Professor D. D. Kosambi of Tata Institute for Fundamental Research, Bom- 
bay, India served as Visiting Professor at the University of Chicago for the 
Winter Quarter. 

Dr. H. G. Landau has resigned his position with the Ballistic Research Labora- 
tories and is now a Research Associate with the Committee on Mathematical 
Biology, at the University of Chicago. 

Mr. Allen L. Mayerson, formerly an Associate in the Division of Statistics 
and Research of the Institute of Life Insurance at New York City, has accepted 
a position with the National Surety Corporation of New York. 

Mr. Raymond P. Peterson, who has been an Assistant in the Mathematics 
Department of the University of California at Los Angeles and also a graduate 
student there, has accepted a position with the Institute for Numerical Analysis 
at Los Angeles. 

Professor Edwin J. G. Pitman has returned to the Mathematics Department 
of the University of Tasmania after spending about a year and a half in the 
United States. From February to June of 1948 he was at Columbia University 
as Visiting Professor of Mathematical Statistics. The rest of the time was spent 
at North Carolina and Princeton. 

A. Ananthapadmanabha Rau has returned to India after studying at the 
Statistical Laboratory in Ames,-Iowa. In addition to heading the Department 
of Statistics and Agriculture Meterology of the Government of the State of 
Mysore, India, he is working on sampling design of experiments, and climatology 
and teaching statistics and climatology at the College of Agriculture. 

Dr. Andrew Sobezyk of Watson Laboratories has been appointed to an 
assistant professorship at Boston University. 

Assistant Professor 8S. L. Thompson of Alabama Polytechnic Institute has 
been promoted to an associate professorship. 

William J. Youden is acting as Assistant Chief of the Statistical Engineering 
Section of the National Bureau of Standards and as special advisor to the Direc- 
tor on the problems of statistical and mathematical design of major experiments 
in physics, chemistry and engineering. 
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Two Doctorates in Mathematical Statistics were awarded at the University of 
North Carolina in June, 1949. The recipients were Uttam Chand, who has now 
been appointed Assistant Professor of Mathematics at Boston University, and 
Ralph A. Bradley, who will be Assistant Professor of Mathematics at McGill 
University. 


(nnn 


The Educational Testing Service, Princeton, N. J., announces the appointment 
of Elbert Lee Hoffman and William Edward Kline as ETS Psychometric Fellows 
for 1949-50 for graduate study in psychology at Princeton University. Mr. 
Hoffman is a graduate of the University of Oklahoma, and Mr. Kline has received 
both his bachelor’s and master’s degree from Yale University. Bert F. Green, Jr. 
and Warren S. Torgerson have received reappointments as ETS Psychometric 
Fellows. Each Fellow carries a full program of graduate study in psychology at 
Princeton University, including basic work in experimental and theoretical 
psychology. Special training is also given in mathematical statistics and modern 
quantitative methods as applied to psychological problems in such fields as 
learning, testing and attitude measurement, as well as in the techniques of 
developing aptitude and achievement tests. In addition to the graduate program 
in psychology, each Fellow spends part-time in training and research work with 
the Educational Testing Service. 

———— ge 
Preliminary Actuarial Examinations 
Prize Awards 


The winners of the prize awards offered by the Society of Actuaries to the 
nine undergraduates ranking highest on the score of Part 2 of the 1949 Pre- 
liminary Actuarial Examinations are as follows: 

First Prize of $200 

Moran, Joseph W... ...... Yale University 


Additional Prizes of $160 


Farmer, Thurston P., Jr. State University of lowa 

Haakenstad, Dale L. University of Michigan 

Hauke, William V. University of Michigan 

Lordan, Joseph D. Massachusetts Institute of Technology 
Mayberry, John P. University of Toronto 

Murch, Alan D. University of Toronto 

White, William A..... Dartmouth College 

Zemach, Ariel Harvard University 


The Society of Actuaries has authorized a similar set of nine prize awards 
for the 1950 Examinations on Part 2. 
The Preliminary Actuarial Examinations consist of the following three 
examinations: 
Part 1. Language Aptitude Examination. 
(Reading comprehension, meaning of words and word relationships, antonyms, 
and verbal reasoning.) 
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Part 2. General Mathematics Examination. 
(Algebra, trigonometry, coordinate geometry, differential and integral calculus.) 


Part 3. Spectal Mathematics Eramination. 


(Finite differences, probability and statistics. 


The 1950 Preliminary Actuarial Examinations wil! be administered by the 
Educational Testing Service at centers throughout the United States and 
Canada on May 19, 1950. The closing date for applications is March 15, 1950. 

Detailed information concerning the Examinations can be obtained from: 


The Society of Actuaries 
208 South LaSalle Street 
Chicago 4, Illinois 
iii 
New Members 
The following persons have been elected to membership in the Institute 
(March 1, 1949 to May 31, 1949) 


Alcantara de Oliveira, Eduardo, Ph.D., (Univ. de Sao Paulo) Professor, Faculdade de Filo- 
sofia, University of Sao Paulo, Rua Sergipe, 96-Ap. 32, Sao Paulo, Brazil. 

Ashby, Wallace L., A.B. (George Washington Univ.) Agricultural Statistician, 3746 Jocelyn 
Street, Washington 15, D.C. 

Bailey, Edward W., 8.Ch. (Ohio State Univ.) Quality Control Supervisor, Carbide and 
Carbon Chemicals Corporation, Y-12 Plant, 101 Moylan Lane, Oak Ridge, Tennessee. 

Berger, Agnes P., Ph.D. (Budapest) 10 Park Avenue, New York, New York. 

Brown, Walter C., B.S. (Colorado AKM College) Graduate Assistant, Department of Mathe- 
maties, University of Oklahoma, 1130 Trout, Norman, Oklahoma. 

Calvin, Lyle D., B.S. (Univ.of Chicago) Research Graduate Assistant, Institute of Statistics, 
North Carolina State College, Raleigh, North Carolina. 

Carlyle, Charles G., B.S. (Univ. of Illinois) Graduate student at University of Illinois, 
C-32 Stadium Terrace, Champaign, Illinois. 

Chen, Yu-nien, M.A. (Harvard) Graduate student, Harvard University, 747-60, Apt. D, 
Charter Road, Jamaica 2, New York. 

Clark, Fred J., Jr., 3.8. (Colorado AKM College) Graduate Assistant at University of 
I}linois, Department of Mathematies, 61 A Court G, Stadium Terrace, Champaign, 
Tllinots. 

Cohen, Samuel E., M.A. (Univ. of Pennsylvania) Statistician, U. S. Bureau of Labor Sta- 
tisties, 49 Galveston St., S.W., Washington 20, D.C. 

Cole, Randal H., Ph.D. (Univ. of Wisgonsin) Associate Professor, University of Western 
Ontario, London, Canada. 

Comrey, Andrew L., Ph.D. (Univ. of Southern Calif.) Assistant Professor of Psychology, 
tniversity of Illinois, Urbana, Illinois. 

Cook, Ellsworth B., B.S. (Springfield College) Head of Visual Screening Devices Research 
and Statisties Facility, U.S. Naval Medical Research Laboratory, Bor 45, Submarine 
Base, New London, Connecticut. 

Cox, David R., Ph.D. (Leeds, England) Statistician, Wool Industries Research Association, 

Sunset Avenue, Leeds 6, Yorks, England. 

Denbow, Carl H., Ph.D. (Univ. of Chicago) Associate Professor of Mathematics, U.S. 
Naval Postgraduate School, Annapolis, Maryland. 

Dillon, Gregory M., A.B. (Long Island Univ.) Statistician, Pension Statistics Section, 
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Treasury Department, E. 1. DuPont de Nemours & Co., 1331 Cedar Street, Wilmington, 
Delaware. 

Duarte, Geraldo Garcia, Licenciado em Matematica (Faculdade de Filosofia de S. Bento) 
Assistente da Faculdate de Higiene e Saude Publica, Caixa Postal 99B, Sao Paulo, 
Brazil. 

Dudman, John A., B.A. (Reed College) Graduate student, Columbia University, 56 West 
70th St., New York 23, New York. 

Edelson, Howard, B.A. (Ohio State Univ., Columbus, Ohio) Graduate student and Graduate 
Assistant, Ohio State University, 794 S. 18th St., Columbus 6, Ohio. 

Feron, R., Licencie es Sciences, (Univ. of Paris) Attache de Recherce, 13 rue des Fewillan- 
tines, Paris V, France. 

Franck, Edward Michel, Inpineur A.I.A., Professor of the Royal Military School, 104 Rue 
Pere Devroye, Woluwe St. Pierre, Belgium. 

Garritsen, Florence M., B.A. (Univ. of Michigan) Research Assistant, General Motors 
Corp., 5151 Lillibridge Ave., Detroit 18, Michigan. 

Gelsomini, Thea, Ph.D. (Univ. of Bocconi, Milano) Assistician of Statistics at Department 
of Statistics, University of Bocconi, Via A. Stoppi, N. 10, Milano, Italy. 

Goudswaazd, G, Ph.D. (Univ. of Leiden) Director, Permanent Office, International Sta- 
tistical Institute and Lecturer of Statistics, Rotterdam School of Economics and Free 
University of Amsterdam, 2 Oostduinlaan, The Hague, Netherlands. 

Gucker, Frank Fulton, A.B. (Harvard Univ.) Statistical Engineer, Remington Arms Co., 
Inc., 3175 Main Street, Bridgeport 6, Connecticut. 

Haberman, Sol, B.A. (Brooklyn College) Assistant Visiting Professor of Sociology, Univer- 
sity of Puerto Rico, 187 Avenida los Flamboyanes, Rio Piedras, Puerto Rico. 

Heimbach, Ernest E., M.B.A. (New York Univ.) Professor of Economics, Bergen College, 
Teaneck, New Jersey, 55 West 11th Strect, New York 11, N.Y. 

Ishii, Shigeru, B.A. (Univ. of Ill.) Student at University of Illinois, 320-1 Peabody Drive, 
Parade Ground Units, Champaign, Illinois. 

Jackson, James Edward, M.A. (Univ. of N.C.) Statistician, Color Control Dept., Eastman 
Kodak Company, 200 Pershing Drive, Rochester, New York. 

Jaspen, Nathan, Ph.D. (Pennsylvania State College) Research Assistant, Department of 
Psychology, Pennsylvania State College, State College, Pennsylvania. 

Jonhagen, Sven, Fil Lic. (Univ. of Stockholm) Chief Actuary and Assistant Teacher in 
Statistics at the University of Stockholm, 7'egnergatan 36, Stockholm, Sweden. 
Kiefer, Jack C., M.S. (Mass. Inst. of Tech.) Student, Department of Mathematical Sta- 

tistics, Columbia University, 3826 Middleton Avenue, Cincinnati 20, Ohio. 

Kraft, Charles Hall, B.A. (Mich. State College) Instructor, Mathematics Department, 
Michigan State College, 707D Chestnut Road, East Lansing, Mich. 

Mayne, John W., M.Sc. (Brown Univ.) Graduate student in Mathematical Statistics, 330 
Furnald Hall, Columbia University, New York 27, New York. 

McCabe, William J., M.A. (George Washington Univ.) Chief Statistician, Transportation 
Corps, Department of the Army, 1725 South Oakland Street, Arlington, Virginia. 

Medin, Knut H., M.A. (Univ. of Uppsala) Assistant, Statistical Institute, University of 
Uppsala, Odinslund 2, Uppsala, Sweden. 

Mewborn, A. Boyd, Ph.D. (Calif. Inst. of Tech.) Associate Professor of Mathematics and 
Mechanics, P.O. Box 1748, Monterey, California. 

Minton, Paul D., M.S. (Southern Methodist Univ.) Graduate student, University of North 
Carolina, P.O. Box 634, Chapel Hill, North Carolina. 

Morris, Doris N., M.A. (Columbia Teachers College) Economics Assistant, Western Electric 
Co., 101 West 72nd St., New York 23, New York. 

Morris, Robert H., B.A. (Swarthmore) Development Engineer, Color Control Department, 
Eastman Kodak Co., Rochester, New York. 





474 NEWS AND NOTICES 


Rajalakshman, D. V., M.Sc. (Madras Univ.) Head, Department of Statistics, University 
of Madras, Madras 5, 8. India. 

Rudy, Norman, M.B.A. (Univ. of Chicago) Scientist, Ordnance Research Project, Univer- 
sity of Chicago and Instructor in Economies, Roosevelt College, 7/05 S. Crandon 
Avenue, Chicago 49, Illinois. 

Sakk, Kaarel, Fil. kand. (Univ. of Stockholm) Officer at Research Bureau of the State Food- 
stuffs Commission, Ostermalmsgatan 67 o.g. ILI, Stockholm, Sweden. 

Singh, Jagjit, B.A. (Punjab Univ.) Superintendent Transportation, IF). 1. Railway, Dinapore, 
c/o B.S. Bugga Esqr., Post Office Box 441, Calcutta, India. 

Starr, Henry H., Ph.D. (Univ. of Vienna) Research Manager, Converted Rice, Ine., P.O. 
Box 1752, Houston 1, Texas. 

Sverdrup, Erling, Actuarian (Univ. of Oslo) Lecturer in Mathematical Statisties, Institute 
of Mathematics, University of Oslo, Oslo, Norway. 

Talacko, Joseph Y., Ph.D. (Charles Univ., Prague) Assistant Professor of Mathematics, 
Marquette University, 2503 So. 10th Street, Milwaukee 7, Wisconsin. 

Templeton, James G. C., A.M. (Princeton Univ.) Graduate student at Princeton Univer- 
sity, Fine Hall, Princeton University, Princeton, New Jersey. 

Vaughan, Elizabeth, }3.8. (Univ. of Washington) Statistician, U.S. Fish and Wildlife Service, 
2725 Montlake Boulevard, Seattle 2, Washington. 

Wilkinson, Bryan, M.A. (Univ. of Nebraska) Personnel Research Specialist, Prudential 
Insurance Co., Western Home Office, 4130-B Muirfield Road, Los Angeles 48, Calt- 
fornia. 

Yevick, Mariam A. L., Ph.D. (Mass. Inst. of Tech.) Staff, Division of Statistical Engineer- 
ing, National Broadcasting System, 921 Hudson St., Hoboken, New Jersey. 





REPORT ON THE BERKELEY MEETING OF THE INSTITUTE 


The thirty-ninth meeting and fifth regional West Coast meeting of the In- 
stitute of Mathematical Statistics was held on the Berkeley campus of the 
University of California, from Thursday June 16 through Saturday June 18, 
1949. The session on June 17 was held jointly with the Biometrics Section of the 
American Statistical Association and the Biometric Society (Western N. A. 
Region). Sixty-six persons registered, including the following fifty members of 
the Institute: 


Jane F. Andrian, G. A. Baker, Z. Wm. Birnbaum, Colin R. Blyth, Albert H. Bowker, 
Paul T. Bruyere, Chin Long Chiang, Edwin L. Crow, John H. Curtiss, R. C. Davis, Carl 
H. Denbow, W. J. Dixon, Mary Elveback, Mark Eudey, Edward A. Fay, Evelyn Fix, 
William R. Gaffey, H. H. Germond, M. A. Girshick, Jack Gysbers, Max Halperin, J. L. 
Hodges, Jr., John M. Howell, Harry M. Hughes, Cuthbert Hurd, Terry A. Jeeves, Mark 
IKac, H.S. Konijn, George M. Kuznets, Erich L. Lehmann, Richard F. Link, Michel Loéve, 
Frank Massey, Lincoln E. Moses, Edith Mourier, Stanley W. Nash, J. Neyman, Edward 
Paulson, Stefan Peters, Raymond P. Peterson, Robert I. Piper, Gladys Rappaport, Mina 
Rees, David Rubinstein, Elizabeth L. Scott, Esther Seiden, Charles M. Stein, John E. 
Walsh, John Wishart, Zivia S. Wurtele. 


Those attending were welcomed at the Thursday morning session by Edward 
W. Strong, Associate Dean of the College of Letters and Science, University of 
California. Professor Z. William Birnbaum of the University of Washington 


presided. 
The program was as follows: 
1. Recent advances in the theory of the Wishart distribution. (Invited paper.) 
John Wishart, Cambridge University. 
2. Bayes, minimax, and other approaches to the multiple classification problem. 
(Invited paper.) M. A. Girshick, Stanford University. 
3. Some problems in sequential analysis. Charles M. Stein, University of 
California, Berkeley. 

Professor Jerzy Neyman of the University of California, Berkeley, presided 
at the Thursday afternoon session. Midway in the program there was an inter- 
mission for a tea given by the Statistical Laboratory, University of California. 
The program was as follows: 


1. Completeness in the sequential case. E. L. Lehmann and C. M. Stein, University of 
California, Berkeley. 
2. Some large sample tests for the median. John E. Walsh, The Rand Corporation. 
3. Continuous sampling plans from the risk point of view. Zivia 8. Wurtele, Stanford 
University. 
. Some problems in point estimation. J. L. Hodges, Jr. and KE. L. Lehmann, University 
of California, Berkeley. 
. Minimum variance in non-regular estimation. R. C. Davis, U. 8. Naval Ordnance Test 
Station, Inyokern. 
}. Some aspects of links between prediction problems and problems of statistical estimation. 
Ierling Sverdrup, University of Oslo. 
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7. Extension of a theorem of Blackwell. (By title). Edward W. Barankin, University of 
California, Berkeley. 

8. Some two-sample tests. (By title). Douglas G. Chapman, University of California, 
Berkeley. 

9. On the existence of consistent tests. (By title). Agnes Berger, Columbia University. 


Professor F. W. Weymouth of Stanford University presided at the Friday 
morning session on biometrics. The program was as follows: 


1. Statistical problems arising from research in tuberculosis. Martha and Paul T. Bruyere, 
U. S. Public Health Service. 
2. Correlation of variability with growth rate in fish and mollusks. F. W. Weymouth, 
Stanford University. 
. Some problems arising in plant selection and the use of analysis of variance. Stanley 
W. Nash, University of California, Berkeley. 
. Studies of resistance of strawberry varieties and selections to verticillum wilt. R. E. Baker 
and G. A. Baker, University of California, Davis. 
. A uniformity trial on unirrigated barley of ten years duration with implications for 
field trial designs. F. J. Veihmeyer, M. R. Huberty, and G. A. Baker, University of 
California, Davis and Los Angeles. 


On Friday afternoon those attending the meeting were entertained at a picnic 
luncheon at Stanford University, given by the Department of Statistics, Stanford 
University. 

Professor C. B. Morrey, Jr., of the University of California, Berkeley, presided 
at the Saturday morning session. The program consisted of the following invited 


papers: 


1. Methods for getting limiting distributions. Mark Kac, Cornell University. 
2. Almost certain convergence. Michel Loéve, University of California, Berkeley. 


At 11 o’clock Saturday morning a business session was held, under the chair- 
manship of Professor Jerzy Neyman of the University of California, Berkeley, 
for the purpose of discussing future West Coast meetings. Plans for reviving the 
Statistical Research Memoirs were also discussed. 

On Saturday afternoon a final session for contributed papers was held under 
the chairmanship of Professor Albert H. Bowker of Stanford University. The 
program was as follows: 


1. Egfect of linear truncation ina multinormal population. Z. William Birnbaum, Univer- 
sity of Washington. ‘ 

. Estimation in truncated samples. Max Halperin, The Rand Corporation. 

. On the similar regions of a class of distributions. Stefan Peters, University of California, 
3erkeley. 
Auxiliary random variables. Mark W. Eudey, California Municipal Statistics. 

5. The ratio of ranges. Richard F. Link, University of Oregon. 

. Statistical problems in the theory of Geiger counters. Colin R. Blyth, University of 
California, Berkeley. 
Asymptotic properties of the Wald-Wolfowitz test of randomness. (By title). Gottfried 
E. Noether, Columbia University. 


J. L. Hopags, Jr. 
Assistant Secretary 





